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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re Patent Application of 

HERMON-TAYLOR et al Atty. Ref.. 117-260 

Serial No. (To Be Assigned) Group: 

Filed: 19 June 1998 Examiner: 

For: NOVEL POLYNUCLEOTIDES AND 
POLYPEPTIDES IN PATHOGENIC 
MYCOBACTERIA AND THEIR USE AS 
DIAGNOSTICS, VACCINES AND 
TARGETS FOR CHEMOTHERAPY 



June 19, 1998 

Honorable Commissioner of Patents 

and Trademarks 
Washington, DC 20231 

Sir: 

PRELIMINARY AMENDMENT 

In order to place the above-identified application in better condition for 
examination, please amend the application as follows: 



IN THE CLAIMS 

Claim 4, lines 2 and 3, change "any one of claims 1 to 3" to ~ Claim 1 or 2 - 
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HERMON-TAYLOR et al 
Serial No. (To Be Assigned) 

Claim 8, line 3, change "any one of claims 4 to 7" to - Claim 4 ~. 
Claim 9, line 2, change "any one of claims 4 to 7" to - Claim 4 ~. 
Claim 10, line 2, change "any one of claims 1 to 3" to ~ Claim 1 or 2 ~. 
Please cancel claim 12 without prejudice. 

Claim 13, lines 4 and 5, change "any one of claims 1 to 3" to ~ Claim 1 or 

2--. 

Claim 14, line 2, change "any one of claims 1 to 3" to ~ Claim 1 or 2 ~. 

Claim 15, line 3, change "claims 1 to 3" to — Claim 1 or 2 ~. 

Claim 16, line 2, change "any one of claims 1 to 3" to ~ Claim 1 or 2 — . 

Claim 18, line 3, change "claims 1 to 3" to ~ Claim 1 or 2~. 

Please delete Claim 19 without prejudice. 

Claim 20, line 1, change "claims 18 or 19" to - Claim 18 -. 

Claim 21, lines 3 and 4, change "any one of claims 1 to 3" to ~ Claim 1 or 

2-. 

REMARKS 

The above amendments are made to place the claims in a more traditional 
format. 



-2- 



282789 



wo 97/23624 





- 1 - 



Novel polynucleotides and polypeptideg in pathogenic jiCT^obac^eri a 



and their use as diacmostics^ vaccines atid^jbaraets for 
chemotherapy . 

This invention relates to the novel polynucleotide sequence we 
5 have designated "GS" which we have identified in pathogenic 
mycobacteria. GS is a pathogenicity island within 8kbi of DNA 
comprising a core region of 5.75kb and an adjacent transmis sable 
element within 2.25kb. GS is contained within Mycobacterium 
paratuherculosis r Mycobacterium avium subsp. silvaticvm and some 
10 pathogenic isolates of M. avium. Functional portions of the core 
region of GS are also represented by regions with a high degree 
of homology that we have identified in cosmids containing genomic 
DNA from Mycobacterimn tuberculosis. 

15 Background to the invention 

Mycobacterium tuberculosis (Mtb) is a major cause of global 
diseases of humans as well as animals. Although conventional 
methods of diagnosis including microscopy, culture and skin 
testing exist for the recognition of these diseases, improved 

20 methods particularly new immunodiagnostics and DNA-based 
detection systems are needed. Drugs used to treat tuberculosis 
are increasingly encountering the problem of resistant organisms. 
New drugs targeted at specific pathogenicity determinants as well 
as new vaccines for the prevention and treatment of tuberculosis 

25 are required. The importance of Mtb as a global pathogen is 
reflected in the commitment being made to sequencing the entire 
genome of this organism. This has generated a large amount of 
DNA sequence data of genomic DNA within cosmid and other 
libraries. Although the DNA sequence is known in the art, the 

30 functions of the vast majority of these sequences, the proteins 
they encode, the biological significance of these proteins, and 
the overall relevance and use of these genes and their products 
as diagnostics, vaccines and targets for chemotherapy for 
tuberculous disease, remains entirely unknown. 



35 Mycobacterium avium subsp . si I vati cum [Mavs] is a pathogenic 
mycobacterium causing diseases of animals and birds, but it can 
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also affect humans . Mycobacterium paratuherculos is [Mpth) causes 
chronic inflammation of the intestine in many species of animals 
including primates and can also cause Crohn's disease in humans. 
Mptb is associated with other chronic inflammatory diseases of 
5 humans such as sarcoidosis. Subclinical Mpth infection is 
widespread in domestic livestock and is present in milk from 
infected animals . The organism is more resistant to 

pasteurisation than Mtb and can be conveyed to humans in retail 
milk supplies. Mptb is also present in water supplies, 

10 particularly those contaminated with run-off from heavily grazed 
pastures. Mptb and Mavs contain the insertion elements IS900 and 
1S902 respectively, and these are linked to pathogenicity in 
these organisms. 1S900 and 1S902 provide convenient highly 
specific multi-copy DNA targets for the sensitive detection of 

15 these organisms using DNA-based methods and for the diagnosis of 
infections in animals and humans. Much improvement is however 
required in the immunodiagnosis of Mptb and Mavs infections in 
animals and humans. Mptb and Mavs are in general, resistant in 
vivo to standard ant i- tuberculous drugs. Although substantial 

20 clinical improvements in infections caused by Mptb, such as 
Crohn's disease, may result from treatment of patients with 
combinations of existing drugs such as Rifabutin, Clarithromycin 
or Azithromycin, additional effective drug treatmeints are 
required. Furthermore, there is an urgent need for effective 

25 vaccines for the prevention and treatment of Mptb and Mavs 
infections in animals and humans based upon the recognition of 
specific pathogenicity determinants. 

Pathogenicity islands are, in general, 7-9kb regions of DNA 
comprising a core domain with multiple ORFs and an adjacent 

30 transmissable element . The transraissable element also encodes 
proteins which may be linked to pathogenicity, such as by 
providing receptors for cellular recognition. Pathogenicity 
islands are envisaged as mobile packages of DNA which, when they 
enter an organism, assist in bringing about its convertion from 

35 a non-disease-causing to a disease -causing strain. 

Description of the Drawings 
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Figure 1(a) and (b) shows a linear map of the pathogenicity 
island GS in Mavs (Fig la) and in Mptb (Fig lb) . The main open 
reading frames are illustrated as ORFs A to H. ORFs A to F are 
found within the core region of GS . ORFs G and H are encoded by 
5 the adjacent transmissable element portion of GS. 

Disclosure of the invention 

Using a DNA-based differential analysis technology vi^e have 
discovered and characterised a novel polynucleotide in Mptb 
("isolates 0022 from a Guernsey cow and 0021 from a red deer) . 
This polynucleotide comprises the gene region we have designated 
GS. GS is found in Mptb using the identifier DNA sequences 
Seq.ID.No 1 and 2 where the Seq.ID No2 is the complementary 
sequence of Seq.ID No 1. GS is also identified in Mavs, The 
complete DNA sequence incorporating the positive strand of GS 
from an isolate of Mavs comprising 7995 nucleotides, including 
the core region of GS and adjacent transsmissable element, is 
given in Seq.ID No. 3. DNA sequence comprising 4435 bp of the 
positive strand of GS obtained from an isolate of Mptb including 
the core region of GS (nucleotides 1614 to 6047 of GS in Mavs) 
is given in Seq.ID No 4. The DNA sequence of GS from Mptb is 
highly (99.4%) homologous to GS in Mavs. The remaining portion 
of the DNA sequence of GS in MptJb, is readily obtainable by a 
person skilled in the art using standard laboratory procedures. 
The entire functional DNA sequence including core region and 
transmisable element of GS in MptJb and Mavs as described above, 
com.prise the polynucleotide sequences of the invention. 

There are 8 open reading frames (ORFs) in GS. Six of these 
designated GSA, GSB, GSC, GSD, GSE and GSF are encoded by the 
core DNA region of GS which, characteristically for a 
30 pathogenicity island, has a different GC content than the rest 
of the microbial genome. Two ORFs designated GSG and GSH are 
encoded by the transmissable element of GS whose GC content 
resembles that of the rest of the mycobacterial genome . The ORF 
GSH comprises two sub-ORFs H2 on the complementary DNA strand 
35 linked by a programmed f rameshif ting site so that a single 
polypeptide is translated from the ORF GSH. The nucleotide 
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sequences of the 8 ORFs in GS and their translations are shown 
in Seq. ID No 5 to Seq.ID No 2 9 as follows: 



ORF A: Seq. ID No 5 Nucleotides 50 to 427 of GS from Mavs 

Seq. ID No 6 Amino acid sequence encoded by Seq.ID No 
5. 



ORF B: Seq. ID No 7 Nucleotides 772 to 1605 of GS from Mavs 
Seq. ID No 8 Amino acid sequence encoded by Seq.ID No 
7. 



ORF C: Seq. ID No 9 Nucleotides 1814 to 2845 of GS from Mavs 
Seq. ID No 10 Amino acid sequence encoded by Seq.ID No 
9. 

Seq. ID No 11 Nucleotides 201 to 1232 of GS from Mptb 
Seq. ID No 12 Amino acid sequence encoded by Seq.ID No 
11 



ORF D: Seq. ID No 13 Nucleotides 2785 to 3804 of GS from Mavs 
Seq. ID No 14 Amino acid sequence encoded by Seq.ID No 
13 . 

Seq. ID No 15 Nucleotides 1172 to 2191 of GS from Mptb 
Seq. ID No 16 Amino acid sequence encoded by Seq.ID No 
15, 



ORF E: Seq. ID No 17 Nucleotides 4080 to 4802 of GS from Mavs 
Seq. ID No 18 Amino acid sequence encoded by Seq.ID No 
17. 

Seq. ID No 19 Nucleotides 2467 to 3189 of GS from Mptb 
Seq. ID No 2 0 Amino acid sequence encoded by Seq.ID No 
19 . 

ORF F: Seq. ID No 21 Nucleotides 4 947 to 574 7 of GS from Mavs 
Seq. ID No 22 Amino acid sequence encoded by Seq.ID No 
21. 

Seq. ID No 23 Nucleotides 3335 to 4135 of GS from Mptb 
Seq. ID No 24 Amino acid sequence encoded by Seq.ID No 
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ORF G: Seq. ID No 25 Nucleotides 6176 to 7042 of GS from Mavs 
Seq. ID No 26 Amino acid sequence encoded by 
Seq. ID No 25. 

ORF H: Seq. ID No 27 Nucleotides 7953 to 6215 from Mavs. 

ORF Hi: Seq. ID No 28 Amino acid sequence encoded by 
nucleotides 7953 to 7006 of Seq. ID No 27 

ORF Hji Seq. ID No 29 Amino acid sequence encoded by 
nucleotides 7009 to 6215 of Seq. ID No 27 



The polynucleotides in Mth with homology to the ORFs B, C, E and 
F of GS in Mpth and Mavs, and the polypeptides they are now known 
to encode as a result of our invention, are as follows: 

ORF B: Seq. ID No 30 Cosmid MTCY277 nucleotides 35493 to 
34705 

Seq. ID No 31 Amino acid sequence encoded by Seq. ID 
No30 . 



ORF C: Seq. ID No 32 Cosmid MTCY277 nucleotides 31972 to 32994 
Seq. ID No 33 Amino acid sequence encoded by Seq. ID 
No32. 



ORF E: Seq. ID No 34 Cosmid MTCY277 nucleotides 34687 to 33956 
Seq. ID No 35 Amino acid sequence encoded by Seq. ID 
No34 . 



ORF E: Seq. ID No 36 Cosmid MT024 nucleotides 15934 to 15203 
Seq. ID No 37 Amino acid sequence encoded by Seq. ID 
No3 6 . 



ORF F: Seq. ID No38 Cosmid MT024 nucleotides 15133 to 14306 
Seq. ID No 3 9 Amino acid sequence encoded by Seq. ID 
No3 8 . 



The proteins and peptides encoded by the ORFs A to H in Mptb and 
Mavs and the amino acid sequences from homologous genes we have 
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discovered in Mtb given in Seq.ID Nos 31, 33, 35, 37 and 39, as 
described above and fragments thereof, comprise the polypeptides 
of the invention. The polypeptides of the invention are believed 
to be associated with specific immunoreactivity and with the 
pathogenicity of the host micro-organisms from which they were 
obtained . 

The present invention thus provides a polynucleotide in 
substantially isolated form which is capable of selectively 
hybridising to sequence ID Nos 3 or 4 or a fragment thereof. The 
polynucleotide fragment may alternatively comprise a sequence 
selected from the group of Seq.ID. No: 5, 7, 9, 11, 13, 15, 17, 
19, 21, 23, 25 and 27. The invention further provides a 
polynucleotide in substantially isolated form whose sequence 
consists essentially of a sequence selected from the group Seq 
ID Nos. 30, 32, 34, 36 and 38, or a corresponding sequence 
selectively hybridizable thereto, or a fragment of said sequence 
or corresponding sequence . 

The invention further provides diagnostic probes such as a probe 
which comprises a fragment of at least 15 nucleotides of a 
polynucleotide of the invention, or a peptide nucleic acid or 
similar synthetic sequence specific ligand, optionally carrying 
a revealing label. The invention also provides a vector carrying 
a polynucleotide as defined above, particularly an expression 
vector. 

25 The invention further provides a polypeptide in substantially 
isolated form which comprises any one of the sequences selected 
from the group consisting Seq. ID. No: 6, 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 29, 31, 33, 35, 37 and 39, or a polypeptide 
substantially homologous thereto. The invention additionally 

30 provides a polypeptide fragment which comprises a fracpnent of a 
polypeptide defined above, said fragment comprising at least 10 
amino acids and an epitope. The invention also provides 

polynucleotides in substantially isolated form which encode 
polypeptides of the invention, and vectors which comprise such 

35 polynucleotides, as well as antibodies capable of binding such 
polypeptides. In an additional aspect, the invention provides 
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kits comprising polynucleotides, polypeptides, antibodies or 
synthetic ligands of the invention and methods of using such kits 
in diagnosing the presence or absence of mycobacteria in a 
sample. The invention also provides pharmaceutical compositions 
comprising polynucleotides of the invention, polypeptides of the 
invention or antisense probes and the use of such compositions 
in the treatment or prevention of diseases caused by 
mycobacteria. The invention also provides polynucleotihe 
prevention and treatment of infections due to GS -containing 
pathogenic mycobacteria in animals and humans and as a means of 
enhacing in vivo susceptibility of said mycobacteria to 
antimicrobial drugs. The invention also provides bacteria or 
viruses transformed with polynucleotides of the invention for use 
as vaccines. The invention further provides Mptb or Mavs in 
which all or part or the polynucleotides of the invention have 
been deleted or disabled to provide mutated organisms of lower 
pathogenicity for use as vaccines in animals and humans. The 
invention further provides Mtb in which all or part of the 
polynucleotides encoding polypeptides of the invention have been 
deleted or disabled to provided mutated organisms or lower 
pathogenicity for use as vaccines in animals and humans. 

A further aspect of the invention is our discovery of homologies 
between the ORFs B, C and E in GS on the one hand, and Mtb cosmid 
MTCY277 on the other (data from Genbank database using the 
computer programmes BLAST and BLIXEM) . The homologous ORFs in 
MTCY277 are adjacent to one another consistent with the form of 
another pathogenicity island in Mtb. A further aspect of the 
invention is our discovery of homologies between ORFs E and F in 
GS, and Mtb cosmid MT024 (also Genbank, as above) with the 
homologous ORFs close to one another. The use of polynucleotides 
and polypeptides from Mtb (Seq. ID Nos 30,31, 32, 33, 34, 35, 36, 
37, 38 and 39) in substantially isolated form as diagnostics, 
vaccines and targets for chemotherapy, for the management and 
prevention of Mtb infections in humans and animals, and the 
processes involved in the preparation and use of these 
diagnostics, vaccines and new chemotherapeutic agents, comprise 
further aspects of the invention. 
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A. PolYnucleo tides 

Polynucleotides of the invention as defined herein may comprise 
DNA or RNA. They may also be polynucleotides which include 
within them synthetic or modified nucleotides or peptide nucleic 
acids. A number of different types of modification to 
oligonucleotides are known in the art. These include 
methylphosphonate and phosphorothioate backbones, addition of 
acridine or polylysine chains at the 3 ' and/ or 5 ' ends of the 
molecule. For the purposes of the present invention, it is to 
be understood that the polynucleotides described herein may be 
modified by any method available in the art. Such modifications 
may be carried out in order to couple the said polynucleotide to 
a solid phase or to enhance the recognition, the in vivo 
activity, or the lifespan of polynucleotides of the invention. 

A number of different types of polynucleotides of the invention 
are envisaged. In the broadest aspect, polynucleotides and 
fragments thereof capable of hybridizing to SEQ ID NO: 3 or 4 form 
a first aspect of the invention. This includes the 

polynucleotide of SEQ ID NO: 3 or 4. Within this class of 
polynucleotides various sub-classes of polynucleotides are of 
particular interest . 

One sub-class of polynucleotides which is of interest is the 
class of polynucleotides encoding the open reading frames A, B, 
C, D, E, F, G and H, including SEQ ID N0s:5, 7, 9, 11, 13, 15, 
17, 19, 21, 23, 25 and 27. As discussed below, polynucleotides 
encoding ORF H include the polynucleotide sequences 7953 to 7006 
and 7009 to 6215 within SEQ ID NO: 27, as well as modified 
sequences in which the frame-shift has been modified so that the 
two sub-reading frames are placed in a single reading frame. 
This may be desirable where the polypeptide is to be piroduced in 
recombinant expression systems . 

The invention thus provides a polynucleotide in substantially 
isolated form which encodes any one of these ORFs or combinations 
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thereof . Combinations thereof includes combinations of 2, 3, 4, 
5 or all of the ORFs . Polynucleotides may be provided which 
comprise an individual ORF carried in a recombinant vector 
including the vectors described herein. Thus in one preferred 

5 aspect the invention provides a polynucleotide in substantially 
isolated form capable of selectively hybridizing to the nucleic 
acid comprising ORFs A to F of the core region of the MptJb and 
Mavs pathogenicity islands of the invention. Fragments thereof 
corresponding to ORFs A to E, B to F, A to D, B to E, A to C, B 

10 to D or any two adjacent ORFs are also included in the invention. 

Polynucleotides of the invention will be capable of selectively 
hybridizing to the corresponding portion of the GS region, or to 
the corresponding ORFs of Mtjb described herein. The term 
"selectively hybridizing" indicates that the polynucleotides will 

15 hybridize, under conditions of medium to high stringency (for 
example 0.03 M sodium chloride and 0.03 M sodium citrate at from 
about 50oC to about 60oC) to the corresponding portion of SEQ ID 
NO: 3 or 4 or the complementary strands thereof but not to genomic 
DNA from mycobacteria which are usually non-pathogenic including 

20 non- pathogenic species of M. avium. Such polynucleotides will 
generally be generally at least 68%, e.g. at least 70%, 
preferably at least 80 or 90% and more preferably at least 95% 
homologous to the corresponding DNA of GS. The corresponding 
portion will be of over a region of at least 20, preferably at 

25 least 30, for instance at least 40, 60 or 10 0 or more contiguous 
nucleotides . 

By "corresponding portion" it is meant a sequence from the GS 
region of the same or substantially similar size which has been 
determined, for example by computer alignment, to have the 
30 greatest degree of homology to the polynucleotide. 

Any combination of the above mentioned degrees of homology and 
minimum sizes may be used to define polynucleotides of the 
invention, with the more stringent combinations (i.e. higher 
homology over longer lengths) being preferred. Thus for example 
35 a polynucleotide which is at least 80% homologous over 25, 
preferably 3 0 nucleotides forms one aspect of the invention, as 
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does a polynucleotide which is at least 90% homologous over 40 
nucleotides . 

A further class of polynucleotides of the invention is the class 
of polynucleotides encoding polypeptides of the invention, the 
polypeptides of the invention being defined in section B below. 
Due to the redundancy of the genetic code as such, 
polynucleotides may be of a lower degree of homology than 
required for selective hybridization to the GS region. However, 
when such polynucleotides encode polypeptides of the invention 
these polynucleotides form a further aspect. It may for example 
be desirable where polypeptides of the invention are produced 
recombinantly to increase the GC content of such polynucleotides. 
This increase in GC content may result in higher levels of 
expression via codon usage more appropriate to the host cell in 
which recombinant expression is taking place. 

An additional class of polynucleotides of the invention are those 
obtainable from cosmids MTCY277 and MT024 (containing Mtb genomic 
sequences) , which polynucleotides consist essentially of the 
fragment of the cosmid containing an open reading frame encoding 
any one of the homologous ORFs B, C, E or F respectively. Such 
polynucleotides are referred to below as Mtb polynucleotides. 
However, where reference is made to polynucleotides in general 
such reference includes Mtb polynucleotides unless the context 
is explicitly to the contrary. In addition, the invention 
provides polynucleotides which encode the same polypeptide as the 
abovementioned ORFs of Mtb but which, due to the redundancy of 
the genetic code, have different nucleotide sequences. These 
form further Mtb polynucleotides of the invention. Fragments of 
MtJb polynucleotides suitable for use as probes or primers also 
form a further aspect of the invention. 

The invention further provides polynucleotides in substantially 
isolated form capable of selectively hybridizing (where 
selectively hybridizing is as defined above) to the Mtb 
polynucleotides of the invention. 
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The invention further provides the Mtb polynucleotides of the 
invention linked, at either the 5' and/or 3' end to 
polynucleotide sequences to which they are not naturally 
contiguous. Such sequences will typically be sequences found in 
5 cloning or expression vectors, such as promoters, 5' untranslated 
sequence, 3' untranslated sequence or termination sequences. The 
sequences may also include further coding sequences such as 
signal sequences used in recombinant production of proteins . 

Further polynucleotides of the invention are illustrated in the 
10 accompanying examples . 

Polynucleotides of the invention may be used to produce a primer, 
e.g. a PCR primer, a primer for an alternative amplification 
reaction, a probe e.g. labelled with a revealing label by 
conventional means using radioactive or non- radioactive labels 

15 or a probe linked covalently to a solid phase, or the 
polynucleotides may be cloned into vectors. Such primers, 
probes and other fragments will be at least 15, preferably at 
least 20, for example at least 25, 30 or 40 or more nucleotides 
in length, and are also encompassed by the term polynucleotides 

20 of the invention as used herein. 

Primers of the invention which are preferred include primers 
directed to any part of the ORFs defined herein. The ORFs from 
other isolates of pathogenic mycobacteria which contain a GS 
region may be determined and conserved regions within each 
25 individual ORF may be identified. Primers directed to such 
conserved regions form a further preferred aspect of the 
invention. In addition, the primers and other polynucleotides 
of the invention may be used to identify, obtain and isolate ORFs 
capable of selectively hybridizing to the polynucleotides of the 

30 invention which are present in pathogenic mycobacteria but which 
are not part of a pathogenicity island in that particular species 
of bacteria. Thus in addition to the ORFs B, C, E and F which 
have been identified in Mtb, similar ORFs may be identified in 
other pathogens and ORFs corresponding to the GS ORFs C, D, E, 

35 F and H, may also be identified. 
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Polynucleotides such as DNA polynucleotides and probes according 
to the invention may be produced reconibinantly, synthetically, 
or by any means available to those of skill in the art . They may 
also be cloned by standard techniques . 

In general, primers will be produced by synthetic means, 
involving a step-wise manufacture of the desired nucleic acid 
sequence one nucleotide at a time. Techniques for accomplishing 
this using automated techniques are readily available in the art. 
Longer polynucleotides will generally be produced using 
recombinant means, for example using a PCR (polymerase chain 
reaction) cloning techniques . This will involve making a pair 
or primers (e.g. of about 15-3 0 nucleotides) to a region of GS, 
which it is desired to clone, bringing the primers into contact 
with genomic DNA from a raycobacterium or a vector carrying the 
GS sequence, performing a polymerase chain reaction under 
conditions which bring about amplification of the desired region, 
isolating the amplified fragment (e.g. by purifying the reaction 
mixture on an agarose gel) and recovering the amplified DNA. The 
primers may be designed to contain suitable restriction enzyme 
recognition sites so that the amplified DNA can be cloned into 
a suitable cloning vector. 

Such techniques may be used to obtain all or part of the GS or 
ORF sequences described herein, as well as further genomic clones 
containing full open reading frames. Although in general such 
techniques are well known in the art, reference may be made in 
particular to Sambrook J., Fritsch EF., Maniatis T (1989). 
Molecular cloning: a Laboratory Manual, 2nd edn. Cold Spring 
Harbor, New York, Cold Spring Harbor Laboratory. 

Polynucleotides which are not 100% homologous to the sequences 
of the present invention but fall within the scope of the 
invention can be obtained in a number of ways . 

Other isolates or strains of pathogenic mycobacteria will be 
expected to contain allelic variants of the GS sequences 
described herein, and these may be obtained for example by 
probing genomic DNA libraries made from such isolates or strains 
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of bacteria using GS or ORF sequences as probes under conditions 
of medium to high stringency {for example 0.03M sodium chloride 
and 0.03M sodium citrate at from about 50°C to about 60°C) . 

A particularly preferred group of pathogenic mycobacteria are 
isolates of M.paratuberculosis . Polynucleotides based on GS 
regions from such bacteria are particularly preferred . Preferred 
fragments of such regions include fragments encoding individual 
open reading frames including the preferred groups and 
combinations of open reading frames discussed above. 

Alternatively, such polynucleotides may be obtained by site 
directed mutagenesis of the GS or ORF sequences or allelic 
variants thereof. This may be useful where for example silent 
codon changes are required to sequences to optimise codon 
preferences for a particular host cell in which the 
polynucleotide sequences are being expressed. Other sequence 
changes may be desired in order to introduce restriction enzyme 
recognition sites, or to alter the property or function of the 
polypeptides encoded by the polynucleotides of the invention. 
Such altered property or function will include the addition of 
amino acid sequences of consensus signal peptides known in the 
art to effect transport and secretion of the modified polypeptide 
of the invention. Another altered property will include 
metagenesis of a catalytic residue or generation of fusion 
proteins with another polypeptide. Such fusion proteins may be 
with an enzyme, with an antibody or with a cytokine or other 
ligand for a receptor, to target a polypeptide of the invention 
to a specific cell type in vitro or in vivo. 

The invention further provides double stranded polynucleotides 
comprising a polynucleotide of the invention and its complement. 

Polynucleotides or primers of the invention may carry a revealing 
label. Suitable labels include radioisotopes such as ^^P or ^^S, 
enzyme labels, other protein labels or smaller labels such as 
biotin or f luorophores . Such labels may be added to 
polynucleotides or primers of the invention and may be detected 
using by techniques known per se. 
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Polynucleotides or primers of the invention or fragments thereof 
labelled or unlabelled may be used by a person skilled in the art 
in nucleic acid-based tests for the presence or absence of Wptb, 
Mavs, other GS- containing pathogenic mycobacteria, or Mth applied 
5 to samples of body fluids, tissues, or excreta from animals and 
humans, as well as to food and environmental samples such as 
river or ground water and domestic water supplies. 

Human and animal body fluids include sputum, blood, serum, 
plasma, saliva, milk, urine, csf, semen, faeces and infected 
10 discharges. Tissues include intestine, mouth ulcers, skin, lymph 
nodes, spleen, lung and liver obtained surgically or by a biopsy 
technique. Animals particularly include commercial livestock 
such as cattle, sheep, goats, deer, rabbits but wild animals and 
animals in zoos may also be tested. 

15 Such tests comprise bringing a human or animal body fluid or 
tissue extract, or an extract of an environmental or food sample, 
into contact with a probe comprising a polynucleotide or primer 
of the invention under hybridising conditions and detecting any 
duplex formed between the probe and nucleic acid in the sample. 

20 Such detection may be achieved using techniques such as PGR or 
by immobilising the probe on a solid support, removing nucleic 
acid in the sample which is not hybridized to the probe, and then 
detecting nucleic acid which has hybridized to the probe. 
Alternatively, the sample nucleic acid may be immobilized on a 

25 solid support, and the amount of probe bound to such a support 
can be detected. Suitable assay methods of this ciny other 
formats can be found in for example WO89/03891 and WO90/13667. 

Polynucleotides of the invention or fragments thereof labelled 
or unlabelled may also be used to identify and characterise 
30 different strains of Mptb, Mavs, other GS -containing pathogenic 
mycobacteria, or Mth, and properties such as drug resistance or 
susceptibility . 

The probes of the invention may conveniently be packacfed in the 
form of a test kit in a suitable container. In such kits the 
35 probe may be bound to a solid support where the assay format for 
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which the kit is designed requires such binding. The kit may 
also contain suitable reagents for treating the sample to be 
probed, hybridising the probe to nucleic acid in the sample, 
control reagents, instructions, and the like. 

The use of polynucleotides of the invention in the diagnosis of 
inflammatory diseases such as Crohn's disease or sarcoidosis in 
humans or Johne's disease in animals form a preferred aspect of 
the invention. The polynucleotides may also be used in the 
prognosis of these diseases. For example, the response of a 
human or animal subject in response to antibiotic, vaccination 
or other therapies may be monitored by utilizing the diagnostic 
methods of the invention over the course of a period of treatment 
and following such treatment. 

The use of Mtb polynucleotides (particularly in the form of 
probes and primers) of the invention in the above-described 
methods form a further aspect of the invention, particularly for 
the detection, diagnosis or prognosis of Mtb infections. 

B . Polypeptides . 

Polypeptides of the invention include polypeptides in 
substantially isolated form encoded by GS. This includes the 
full length polypeptides encoded by the positive and 
complementary negative strands of GS . Each of the full length 
polypeptides will contain one of the amino acid sequences set out 
in Seq ID NOs : 6 , 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 and 
29. Polypeptides of the invention further include variants of 
such sequences, including naturally occurring allelic variants 
and synthetic variants which are substantially homologous to said 
polypeptides. In this context, substantial homology is regarded 
as a sequence which has at least 70%, e.g. 80%, 90%, 95% or 98% 
amino acid homology (identity) over 3 0 or more, e.g 40, 50 or 100 
amino acids. For example, one group of substantially homolgous 
polypeptides are those which have at least 95% amino acid 
identity to a polypeptide of any one of Seq ID NOs : 6 , 8, 10, 12, 
14, 16, 18, 20, 22, 24, 26, 28 and 29 over their entire length. 
Even more preferably, this homology is 98%. 
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Polypeptides of the invention further include the polypeptide 
sequences of the homologous ORFs of Mtb, namely Seq ID Nos . 31, 
33, 35, 37 and 39. Unless explicitly specified to the contrary, 
reference to polypeptides of the invention and their fragments 
5 include these Mtb polypeptides and fragments, and variants 
thereof (substanially homologous to said sequences) as defined 
herein. 

Polypeptides of the invention may be obtained by the standard 
techniques mentioned above. Polypeptides of the invention also 

10 include fragments of the above mentioned full length polypeptides 
and variants thereof, including fragments of the sequences set 
out in SEQ ID NOs : 6 , 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 
29, 31, 33, 35, 37 and 39. Such fragments for example of 8, 10, 
12, 15 or up to 30 or 40 amino acids may also be obtained 

15 synthetically using standard techniques known in the art. 

Preferred fragments include those which include an epitope, 
especially an epitope which is specific to the pathogenicity of 
the mycobacterial cell from which the polypeptide is derived. 
Suitable fragments will be at least about 5, e.g. 8, 10, 12, 15 
20 or 20 amino acids in size, or larger. Epitopes may be determined 
either by techniques such as peptide scanning techniques as 
described by Geysen et al. Mo 1 . Immunol . , 23; 709-715 (1986), as 
well as other techniques known in the art. 

The term "an epitope which is specific to the pathogenicity of 
25 the mycobacterial cell" means that the epitope is encoded by a 
portion of the GS region, or by the corresponding ORF sequences 
of Mtb which can be used to distinguish mycobacteria which are 
pathogenic by from related non-pathogenic mycobacteria including 
non-pathogenic species of M. avium. This may be determined using 
30 routine methodology. A candidate epitope from an ORF may be 
prepared and used to immunise an animal such as a rat or rabbit 
in order to generate antibodies. The antibodies may then be used 
to detect the presence of the epitope in pathogenic mycobacteria 
and to confirm that non-pathogenic mycobacteria do not contain 
35 any proteins which react with the epitope. Epitopes may be 
linear or conformational. 
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Polypeptides of the invention may be in a substantially isolated 
form. It will be understood that the polypeptide may be mixed 
with carriers or diluents which will not interfere with the 
intended purpose of the polypeptide and still be regarded as 
substantially isolated. A polypeptide of the invention may also 
be in a substantially purified form, in which case it will 
generally comprise the polypeptide in a preparation in which more 
than 90%, e.g. 95%, 98% or 99% of the polypeptide in the 
preparation is a polypeptide of the invention. 

Polypeptides of the invention may be modified to confer a desired 
property or function for example by the addition of Histidine 
residues to assist their purification or by the addition of a 
signal sequence to promote their secretion from a cell. 

Thus, polypeptides of the invention include fusion proteins which 
comprise a polypeptide encoding all or part of one or more of an 
ORF of the invention fused at the N- or C- terminus to a second 
sequence to provide the desired property or function. Sequences 
which promote secretion from a cell include, for example the 
yeast a-f actor signal sequence. 

A polypeptide of the invention may be labelled with a revealing 
label. The revealing label may be any suitable label which 
allows the polypeptide to be detected. Suitable labels include 
radioisotopes, e.g. "^I, ^^S enzymes, antibodies, polynucleotides 
and ligands such as biotin. Labelled polypeptides of the 
invention may be used in diagnostic procedures such as 
immunoassays in order to determine the amount of a polypeptide 
of the invention in a sample. Polypeptides or labelled 
polypeptides of the invention may also be used in serological or 
cell mediated immune assays for the detection of immune 
reactivity to said polypeptides in animals and humans using 
standard protocols. 

A polypeptide or labelled polypeptide of the invention or 
fragment thereof may also be fixed to a solid phase, for example 
the surface of an immunoassay well, microparticle, dipstick or 
biosensor. Such labelled and/or immobilized polypeptides may be 
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packaged into kits in a suitable container along with suitable 
reagents, controls, instructions and the like. 

Such polypeptides and kits may be used in methods of detection 
of antibodies or cell mediated immunoreactivity, to the 
mycobacterial proteins and peptides encoded by the ORFs of the 
invention and their allelic variants and fragments, using 
immunoassay. Such host antibodies or cell mediated immune 
reactivity will occur in humans or animals with an immune system 
which detects and reacts against polypeptides of the invention. 
The antibodies may be present in a biological sample from such 
humans or animals, where the biological sample may be a sample 
as defined above particularly blood, milk or saliva. 

Immunoassay methods are well known in the art and will generally 
comprise : 

(a) providing a polypeptide of the invention comprising an 
epitope bindable by an antibody against said 
mycobacterial polypeptide; 

(b) incubating a biological sample with said polypeptide 
under conditions which allow for the formation of an 
antibody -antigen complex; and 

(c) determining whether antibody -antigen complex 
comprising said polypeptide is formed. 

Immunoassay methods for cell mediated immune reactivity in 
animals and humans are also well known in the art (e.g. as 
described by Weir et al 1994, J. Immunol Methods 176 ; 93-101) and 
will generally comprise 

(a) providing a polypeptide of the invention comprising an 
epitope bindable by a lymphocyte or macrophage or 
other cell receptor; 

(b) incubating a cell sample with said polypeptide under 
conditions which allow for a cellular immune response 
such as release of cytokines or other mediator to 
occur; and 

(c) detecting the presence of said cytokine or mediator in 
the incubate. 
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Polypeptides of the invention may be made by standard synthetic 
means well known in the art or recorabinantly, as described below. 

Polypeptides of the invention or fragments thereof labelled or 
unlabelled may also be used to identify and characterise 
5 different strains of Mpth, Mavs, other GS- containing pathogenic 
mycobacteria, or Mtb, and properties such as drug resistance or 
susceptibility . 

The polypeptides of the invention may conveniently be packaged 
in the form of a test kit in a suitable container. In such kits 
10 the polypeptide may be bound to a solid support where the assay 
format for which the kit is designed requires such binding. The 
kit may also contain suitable reagents for treating the sample 
to be examined, control reagents, instructions, and the like. 

The use of polypeptides of the invention in the diagnosis of 
15 inflammatory diseases such as Crohn's disease or sarcoidosis in 
humans or Johne's disease in animals form a preferred aspect of 
the invention. The polypeptides may also be used in the 
prognosis of these diseases. For example, the response of a 
human or animal subject in response to antibiotic or other 
20 therapies may be monitored by utilizing the diagnostic methods 
of the invention over the course of a period of treatment and 
following such treatment . 

The use of Mtb polypeptides of the invention in the above - 
described methods form a further aspect of the invention, 
25 particularly for the detection, diagnosis or prognosis of Mtb 
infections . 

Polypeptides of the invention may also be used in assay methods 
for identifying candidate chemical compounds which will be useful 
in inhibiting, binding to or disrupting the function of said 
30 polypeptides required for pathogenicity. In general, such assays 
involve bringing the polypeptide into contact with a candidate 
inhibitor compound and observing the ability of the compound to 
disrupt, bind to or interfer with the polypeptide. 
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There are a number of ways in which the assay may be formatted. 
For example, those polypeptides which have an enzymatic function 
may be assayed using labelled substrates for the enzyme, and the 
amount of, or rate of, conversion of the substrate into a product 
measured, e.g by chromatograpy such as HPLC or by a colourimetric 
assay. Suitable labels include ^^S, "^I, biotin or enzymes such 
as horse radish peroxidase . 

For example, the gene product of ORF C is believed to have GDP- 
mannose dehydratase activty. Thus an assay for inhbitors of the 
gene product may utilise for example labelled GDP-mannose, GDP 
or mannose and the activity of the gene product followed. ORF 
D encodes a gene related to the synthesis and regulation of 
capuslar polysaccharides, which are often associated with 
invasiveness and pathogenicity. Labelled polysaccharide 

substrates may be used in assays of the ORF D gene product. The 
gene product of ORF F encodes a protein with putative glucosyl 
transferase activity and thus labelled amino sugars such as ^-1- 
3-N-acetylglucosamine may be used as substrates in assays. 

Candidate chemical compounds which may be used may be natural or 
synthetic chemical compounds used in drug screening programmes. 
Extracts of plants which contain several characterised or 
Xincharacterised components may also be used. 

Alternatively, the a polypeptide of the invention may be screened 
against a panel of peptides, nucleic acids or other chemical 
functionalities which are generated by combinatorial chemistry. 
This will allow the definition of chemical entities which bind 
to polypeptides of the invention. Typically, the polypeptide of 
the invention will be brought into contact with a panel of 
compounds from a combinantorial library, with either the panel 
or the polypeptide being immobilized on a solid phcise, under 
conditions suitable for the polypeptide to bind to the panel. 
The solid phase will then be washed under conditions in which 
only specific interactions between the polypeptide and individual 
members of the panel are retained, and those specific members may 
be utilized in further assays or used to design further panels 
of candidate compounds . 
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For example, a number of assay methods to define peptide 
interaction with peptides are known. For example, W08 6/009 91 
describes a method for determining mimotopes which comprises 
making panels of catamer preparations, for example octamers of 
5 amino acids, at which one or more of the positions is defined and 
the remaining positions are randomly made up of other amino 
acids, determining which catamer binds to a protein of interest 
and re-screening the protein of interest against a further panel 
based on the most reactive catamer in which one or more 
10 additional designated positions are systematically varied. This 
may be repeated throughout a number of cycles and used to build 
up a sequence of a binding candidate compound of interest. 

WO89/03430 describes screening methods which permit the 
preparation of specific mimotopes which mimic the immunological 

15 activity of a desired analyte. These mimotopes are identified 
by reacting a panel of individual peptides wherein said peptides 
are of systematically varying hydrophobicity, amphipathic 
characteristics and charge patterns, using an antibody against 
an antigen of interest. Thus in the present case antibodies 

20 against the a polypeptide of the inventoin may be employed and 
mimotope peptides from such panels may be identified. 

C ■ Vectors . 

Polynucleotides of the invention can be incorporated into a 
recombinant replicable vector. The vector may be used to 

25 replicate the nucleic acid in a compatible host cell. Thus in 
a further embodiment, the invention provides a method of making 
polynucleotides of the invention by introducing a polynucleotide 
of the invention into a replicable vector, introducing the vector 
into a compatible host cell, and growing the host cell under 

30 conditions which bring about replication of the vector. The 
vector may be recovered from the host cell . Suitable host cells 
are described below in connection with expression vectors. 

D. Expression Vectors. 
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Preferably, a polynucleotide of the invention in a vector is 
operably linked to a control sequence which is capable of 
providing for the expression of the coding sequence by the host 
cell, i.e. the vector is an expression vector. The term "operably 
linked" refers to a juxtaposition wherein the components 
described are in a relationship permitting them to function in 
their intended manner. A control sequence "operably linked" to 
a coding sequence is ligated in such a way that expression of the 
coding sequence is achieved under conditions compatible with the 
control sequences. Such vectors may be transformed into a 
suitable host cell as described above to provide for expression 
of a polypeptide of the invention. Thus, in a further aspect the 
invention provides a process for preparing polypeptides according 
to the invention which comprises cultivating a host cell 
transformed or transfected with an expression vector as described 
above, under conditions to provide for expression by the vector 
of a coding sequence encoding the polypeptides, and recovering 
the expressed polypeptides. 

A further embodiment of the invention provides vectors for the 
replication and expression of polynucleotides of the invention, 
or fragments thereof. The vectors may be for example, plasraid, 
virus or phage vectors provided with an origin of replication, 
optionally a promoter for the expression of the said 
polynucleotide and optionally a regulator of the promoter. The 
vectors may contain one or more selectable marker genes, for 
example an ampicillin resistance gene in the case of a bacterial 
plasmid or a neomycin resistance gene for a mammalian vector. 
Vectors may be used in vitro, for example for the production of 
RNA or used to transf ect or transform a host cell . The vector 
may also be adapted to be used in vivo, for example in a method 
of naked DNA vaccination or gene therapy. A further embodiment 
of the invention provides host cells transformed or transfected 
with the vectors for the replication and expression of 
polynucleotides of the invention, including the DNA of GS, the 
open reading frames thereof and other corresponding ORFs 
particularly ORFs B, C, E and F from Mtb. The cells will be 
chosen to be compatible with the said vector and may for example 
be bacterial, yeast, insect or mammalian. 
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Expression vectors are widely available in the art and can be 
obtained commercially. Mammalian expression vectors may comprise 
a mammalian or viral promoter. Mammalian promoters include the 
metallothionien promoter. Viral promoters include promoters from 
adenovirus, the SV40 large T promoter and retroviral LTR 
promoters. Promoters compatible with insect cells include the 
polyhedrin promoter. Yeast promoters include the alcohol 
dehydrogenase promoter. Bacterial promoters include the 
jS-galactosidase promoter. 

The expression vectors may also comprise enhancers, and in the 
case of eukaryotic vectors polyadenylation signal sequence 
downstream of the coding sequence being expressed. 

Polypeptides of the invention may be expressed in suitable host 
cells, for example bacterial, yeast, plant, insect and mammalian 
cells, and recovered using standard purification techniques 
including, for example affinity chromatography, HPLC or other 
chromatographic separation techniques. 

Polynucleotides according to the invention may also be inserted 
into the vectors described above in an antisense orientation in 
order to provide for the production of antisense RNA. i^ntisense 
RNA or other antisense polynucleotides or ligands may also be 
produced by synthetic means , Such antisense polynucleotides may 
be used in a method of controlling the levels of the proteins 
encoded by the ORFs of the invention in a mycobacterial cell . 

Polynucleotides of the invention may also be carried by vectors 
suitable for gene therapy methods. Such gene therapy methods 
include those designed to provide vaccination against diseases 
caused by pathogenic mycobacteria or to boost the immune response 
of a human or animal infected with a pathogenic mycobacteria. 

For example, Ziegner et al, AIDS, 1995, 9, -43 -50 describes the use 
of a replication defective recombinant amphotropic retrovirus to 
boost the immune response in patients with HIV infection. Such 
a retrovirus may be modified to carry a polynucleotide encoding 
a polypeptide or fragment thereof of the invention and the 
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retrovirus delivered to the cells of a human or animal subject 
in order to provide an immune response against said pol^'peptide . 
The retrovirus may be delivered directly to the patient or may 
be used to infecte cells ex-vivo, e.g. fibroblast cells, which 
are then introduced into the patient, optionally after being 
inactivated. The cells are desirably autologous or HLA-matched 
cells from the human or animal subject. 

Gene therapy methods including methods for boosting sin immune 
response to a particluar pathogen are disclosed generally in for 
example WO95/14091, the disclosure of which is incoporated herein 
by reference. Recombinant viral vectors include retroviral 
vectors, adenoviral vectors, adeno-associated viral vectors, 
vaccinia virus vectors, herpes virus vectors and alphavirus 
vectors. Alpha virus vectors are described in, for example, 
WO95/07994, the disclosure of which is incorporated herein by 
reference . 

Where direct administration of the recombinant viral vector is 
contemplated, either in the form of naked nucleic acid or in the 
form of packaged particles carrying the nucleic acid this may be 
done by any suitable means, for example oral administration or 
intravenous injection. From 10= to 10^ c.f .u of virus represents 
a typical dose, which may be repeated for example weekly over a 
period of a few months. Administration of autologous or HLA- 
matched cells infected with the virus may be more convenient in 
25 some cases. This will generally be achieved by administering 
doses, for example from 10^ to 10^ cells per dose which may be 
repeated as described above . 

The recombinant viral vector may further comprise nucleic acid 
capable of expressing an accessory molecule of the immune system 

30 designed to increase the immune response. Such a moleclue may 
be for example and interferon, particularly interferon gamma, an 
interleukin, for example IL-la, IL-liS or IL-2, or an HLA class 
I or II moleclue. This may be particularly desirable where the 
vector is intended for use in the treatment of humans or animals 

35 already infected with a mycobacteria and it is desired to boost 
the immune response . 
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E. Antibodies. 

The invention also provides monoclonal or polyclonal antibodies 
to polypeptides of the invention or fragments thereof. The 
invention further provides a process for the production of 
monoclonal or polyclonal antibodies to polypeptides of the 
invention. Monoclonal antibodies may be prepared by conventional 
hybridoma technology using the polypeptides of the invention or 
peptide fragments thereof, as immunogens . Polyclonal antibodies 
may also be prepared by conventional means which comprise 
inoculating a host animal, for example a rat or a rabbit, with 
a polypeptide of the invention or peptide fragment thereof and 
recovering immune serum. 

In order that such antibodies may be made, the invention also 
provides polypeptides of the invention or fragments thereof 
haptenised to another polypeptide for use as immunogens in 
animals or humans . 

For the purposes of this invention, the term "antibody", unless 
specified to the contrary, includes fragments of whole antibodies 
which retain their binding activity for a polypeptide of the 
invention. Such fragments include Fv, F(ab') and FCabMz 
fragments, as well as single chain antibodies. Furthermore, the 
antibodies and fragments thereof may be humanised antibodies, 
e.g. as described in EP-A-23 9400. 

Antibodies may be used in methods of detecting polypeptides of 
the invention present in biological samples (where such samples 
include the human or animal body samples, and environmental 
samples, mentioned above) by a method which comprises: 

(a) providing an antibody of the invention; 

(b) incubating a biological sample with said antibody 
under conditions which allow for the formation of an 
antibody- antigen complex; and 

(c) determining whether antibody-antigen complex 
comprising said antibody is formed. 
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Antibodies of the invention may be bound to a solid support for 
example an immunoassay well, microparticle, dipstick or biosensor 
and/or packaged into kits in a suitable container along with 
suitable reagents, controls, instructions and the like. 

Antibodies of the invention may be used in the detection, 
diagnosis and prognosis of diseases as descirbed above in 
relation to polypeptides of the invention. 

F . Compositions . 

The present invention also provides compositions comprising a 
polynucleotide or polypeptide of the invention together with a 
carrier or diluent. Compositions of the invention also include 
compositions comprising a nucleic acid, particularly and 
expression vector, of the invention. Compositions further 
include those carrying a recombinant virus of the invention. 
Such compositions include pharmaceutical compositions in which 
case the carrier or diluent will be pharmaceutical ly acceptable. 

Pharmaceutically acceptable carriers or diluents include those 
used in formulations suitable for inhalation as well as oral, 
parenteral (e.g. intramuscular or intravenous or transcutaneous) 
administration. The formulations may conveniently be presented 
in unit dosage form and may be prepared by any of the methods 
well known in the art of pharmacy. Such methods include the step 
of bringing into association the active ingredient with the 
carrier which constitutes one or more accessory ingredients. In 
general the formulations are prepared by uniformly and intimately 
bringing into association the active ingredient with liquid 
carriers or finely divided solid carriers or both, and then, if 
necessary, shaping the product. 

For example, formulations suitable for parenteral administration 
include aqueous and non- aqueous sterile injection solutions which 
may contain anti -oxidants , buffers, bacteriostats and solutes 
which render the formulation isotonic with the blood of the 
intended recipient, and aqueous and non- aqueous sterile 
suspensions which may include suspending agents and thickening 
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agents, and liposomes or other microparticulate systems which are 
designed to target the polynucleotide or the polypeptide of the 
invention to blood components or one or more organs, or to target 
cells such as M cells of the intestine after oral administration. 



5 G. Vaccines. 



In another aspect, the invention provides novel vaccines for the 
prevention and treatment of infections caused by Mptb, Mavs, 
other GS- containing pathogenic mycobacteria and Mtb in animals 
and humans. The term "vaccine" as used herein means an agent 

10 used to stimulate the immune system of a vertebrate, particularly 
a warm blooded vertebrate including humans, so as to provide 
protection against future harm by an organism to which the 
vaccine is directed or to assist in the eradication of an 
organism in the treatment of established infection. The immune 

15 system will be stimulated by the production of cellular immunity 
antibodies, desirably neutralizing antibodies, directed to 
epitopes found on or in a pathogenic mycobacterium which 
expresses any one of the ORFs of the invention. The antibody so 
produced may be any of the immunological classes, such as the 

20 immunoglobulins A, D, E, G or M. Vaccines which stimulate the 
production of IgA are interest since this is the principle 
immunoglobulin produced by the secretory system of warm-blooded 
animals, and the production of such antibodies will help prevent 
infection or colonization of the intestinal tract. However an 

25 IgM and IgG response will also be desirable for systemic 
infections such as Crohn's disease or tuberculosis. 



Vaccines of the invention include polynucleotides of the 
invention or fragments thereof in suitable vectors and 
administered by injection of naked DNA using standard protocols. 

30 Polynucleotides of the invention or fragments thereof in suitable 
vectors for the expression of the polypeptides of the invention 
may be given by injection, inhalation or by mouth. Suitable 
vectors include M.Jbovis BCG, M.smegma.tis or other mycobacteria, 
Corynehacteria , Salmonella. or other agents according to 

35 established protocols. 
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Polypeptides of the invention or fragments thereof in 
substantially isolated form may be used as vaccines by injection, 
inhalation, oral administration or by transcutaneous application 
according to standard protocols. Adjuvants {such as Iscoms or 
polylactide-coglycolide encapsulation) , cytokines such as IL-12 
and other immunomodulators may be used for the selective 
enhancement of the cell mediated or humoral immunological 
responses. Vaccination with polynucleotides and/or polypeptides 
of the invention may be undertaken to increase the susceptibility 
of pathogenic mycobacteria to antimicrobial agents in vivo. 

In instances wherein the polypeptide is correctly configured so 
as to provide the correct epitope, but is too small to be 
immunogenic, the polypeptide may be linked to a suitable carrier. 

A number of techniques for obtaining such linkage are known in 
the art, including the formation of disulfide linkages using N- 
succinimidyl-3- (2-pyridylthio) propionate (SPDP) and succinimidyl 
4- (N-maleimido-methyl) cyclohexane-l-carboxylate (SMCC) obtained 
from Pierce Company, Rockford, Illinois, (if the peptide lacks 
a sulfhydryl group, this can be provided by addition of a 
cysteine residue) . These reagents create a disulfide linkage 
between themselves and peptide cysteine residues on one protein 
and an amide linkage through the epsilon-amino on a lysine, or 
other free amino group in the other. A variety of such 
disulf ide/amide-forming agents are known. See, for example, 
Immun Rev (1982) 62:185. Other bifunctional coupling agents form 
a thioether rather than a disulfide linkage. Many of these thio- 
ether-f oirming agents are commercially available and include 
reactive esters of 6-maleimidocaproic acid, 2-bromoacetic acid, 
2-iodoacetic acid, 4- (N-maleimido-methyl) cyclohexane-l-carboxylic 
acid, and the like. The carboxyl group can be activated by 
combining them with succinimide or l-hydroxyl -2 -nitro- 4 -sulfonic 
acid, sodium salt . Additional methods of coupling antigens 
employs the rotavirus /"binding peptide" system described in EPO 
Pub. No. 25 9,14 9, the disclosure of which is incorporated herein 
by reference. The foregoing list is not meant to be e:x±Laustive, 
and modifications of the named compounds can clearly be used. 
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Any carrier may be used which does not itself induce the 
production of antibodies harmful to the host. Suitable carriers 
are typically large, slowly metabolized macromolecules such as 
proteins; polysaccharides, such as latex f unctionalized 
Sepharose®, agarose, cellulose, cellulose beads and the like; 
polymeric amino acids, such as polyglutaraic acid, polylysine, 
polylactide-coglycolide and the like; amino acid copolymers; and 
inactive virus particles. Especially useful protein substrates 
are serum albumins, keyhole limpet hemocyanin, immunoglobulin 
molecules, thyroglobulin, ovalbumin, tetanus toxoid, and other 
proteins well known to those skilled in the art . 

The immunogenicity of the epitopes may also be enhanced by 
preparing them in mammalian or yeast systems fused with or 
assembled with particle -forming proteins such as, for example, 
that associated with hepatitis B surface antigen. See, e.g., US- 
A-4,722,840. Constructs wherein the epitope is linked directly 
to the particle -forming protein coding sequences produce hybrids 
which are immunogenic with respect to the epitope. In addition, 
all of the vectors prepared include epitopes specific to HBV, 
having various degrees of immunogenicity, such as, for example, 
the pre-S peptide. 

In addition, portions of the particle -forming protein coding 
sequence may be replaced with codons encoding an epitope of the 
invention. In this replacement, regions which are not required 
to mediate the aggregation of the units to form immunogenic 
particles in yeast or mammals can be deleted, thus eliminating 
additional HBV antigenic sites from competition with the epitope 
of the invention. 

Vaccines may be prepared from one or more immunogenic 
polypeptides of the invention. These polypeptides may be 
expressed in various host cells (e.g., bacteria, yeast, insect, 
or mammalian cells) , or alternatively may be isolated from viral 
preparations or made synthetically. 

In addition to the above, it is also possible to prepare live 
vaccines of attenuated microorganisms which express one or more 
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recombinant polypeptides of the invention. Suitable attenuated 
microorganisms are known in the art and include, for example, 
viruses (e.g., vaccinia virus), as well as bacteria. 

The preparation of vaccines which contain an immunogenic 
polypeptide (s) as active ingredients, is known to one skilled in 
the art. Typically, such vaccines are prepared as injectables, 
or as suitably encapsulated oral preparations and either liquid 
solutions or suspensions; solid forms suitable for solution in, 
or suspension in, liquid prior to injestion or injection may also 
be prepared. The preparation may also be emulsified, or the 
protein encapsulated in liposomes. The active immunogenic 
ingredients are often mixed with excipients which are 
pharmaceutically acceptable and compatible with the active 
ingredient. Suitable excipients are, for example, water, saline, 
dextrose, glycerol, ethanol, or the like and combinations 
thereof. In addition, if desired, the vaccine may contain minor 
amounts of auxiliary substances such as wetting or emulsifying 
agents, pH buffering agents, and/or adjuvants which enhance the 
effectiveness of the vaccine. Examples of adjuvants which may 
be effective include but are not limited to: aluminum hydroxide, 
N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP) , N-acetyl- 
nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as 
nor-MDP) , N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-a.lanine-2- 
(1' -2' -dipalmitoyl-sn-glycero-3-hydroxyphosphoryloxy) -ethylamine 
(CGP 19835A, referred to as MTP-PE) , and RIBI, which contains 
three components extracted from bacteria, monophosphoryl lipid 
A, trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in 
a 2% squalene/Tween® 80 emulsion. The effectiveness of an 
adjuvant may be determined by measuring the amount of antibodies 
directed against an immunogenic polypeptide containing an 
antigenic sequence resulting from administration of this 
polypeptide in vaccines which are also comprised of the various 
adjuvants . 

The vaccines are conventionally administered parenterally , by 
injection, for example, either subcutaneously or intramuscularly. 
Additional formulations which are suitable for other modes of 
administration include suppositories, oral formulations or as 
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enemas. For suppositories, traditional binders and carriers may 
include, for example, polyalkylene glycols or triglycerides; such 
suppositories may be formed from mixtures containing the active 
ingredient in the range of 0.5% to 10%, preferably 1% - 2%. Oral 
formulations include such normally employed excipients as, for 
example, pharmaceutical grades of mannitol, lactose, starch, 
magnesium stearate, sodium saccharine, cellulose, magnesium 
carbonate, and the like. These compositions take the form of 
solutions, suspensions, tablets, pills, capsules, sustained 
release formulations or powders and contain 10% - 95% of active 
ingredient, preferably 25% - 70%. 

The proteins may be formulated into the vaccine as neutral or 
salt forms. Pharmaceutically acceptable salts include the acid 
addition salts (formed with free amino groups of the peptide) and 
which are formed with inorganic acids such as, for example, 
hydrochloric or phosphoric acids, or such organic acids such as 
acetic, oxalic, tartaric, maleic, and the like. Salts formed 
with the free carboxyl groups may also be derived from inorganic 
bases such as, for example, sodium, potassium, ammonium, calcium, 
or ferric hydroxides, and such organic bases as isopropyl amine, 
trimethylamine, 2-ethylamino ethanol, histidine, procaine, and 
the like. 

The vaccines are administered in a manner compatible with the 
dosage formulation, and in such amount as will be 
prophylactically and/or therapeutically effective. The quantity 
to be administered, which is generally in the range of 5/ig to 
250/xg, of antigen per dose, depends on the subject to be treated, 
capacity of the subject's immune system to synthesize antibodies, 
mode of administration and the degree of protection desired. 
Precise amounts of active ingredient required to be administered 
may depend on the judgement of the practitioner and may be 
peculiar to each subject. 

The vaccine may be given in a single dose schedule, or preferably 
in a multiple dose schedule. A multiple dose schedule is one in 
which a primary course of vaccination may be with 1-10 separate 
doses, followed by other doses given at subsequent time intervals 
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required to maintain and or reenforce the immune response, for 
example, at 1-4 months for a second dose, and if needed, a 
subsequent dose(s) after several months. The dosage regimen will 
also, at least in part, be determined by the need of the 
individual and be dependent upon the judgement of the 
prac t it ioner . 

In a further aspect of the invention, there is provided an 
attenuated vaccine comprising a normally pathogenic mycobacteria 
which harbours an attenuating mutation in any one of the genes 
encoding a polypeptide of the invention. The gene is selected 
from the group of ORFs A, B, C, D, E, F, G and H, including the 
homologous ORFs B, C, E and F in Mtb. 

The mycobacteria may be used in the form of killed bacteria or 
as a live attenuated vaccine. There are advantages to a live 
attenuated vaccine. The whole live organism is used, rather than 
dead cells or selected cell components which may exhibit modified 
or denatured antigens. Protein antigens in the outer membrane 
will maintain their tertiary and quaternary structures. 
Therefore the potential to elicit a good protective long term 
immunity should be higher - 

The term "mutation" and the like refers to a genetic lesion in 
a gene which renders the gene non-functional. This may be at 
either the level of transcription or translation. The term thus 
envisages deletion of the entire gene or substantial portions 
thereof, and also point mutations in the coding sequemce which 
result in truncated gene products unable to carry out the normal 
function of the gene. 

A mutation introduced into a bacterium of the invention will 
generally be a non-reverting attenuating mutation. Non -reverting 
means that for practical purposes the probability of the mutated 
gene being restored to its normal function is small, for example 
less than 1 in 10^ such as less than 1 in 10' or even less than 
1 in 10^^. 
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An attenuated mycobacteria of the invention may be in isolated 
form. This is usually desirable when the bacterium is to be used 
for the purposes of vaccination. The term "isolated" means that 
the bacterium is in a form in which it can be cultured, processed 
or otherwise used in a form in which it can be readily identified 
and in which it is substantially uncontaminated by other 
bacterial strains, for example non- attenuated parent strains or 
unrelated bacterial strains. The term "isolated bacterium" thus 
encompasses cultures of a bacterial mutant of the invention, for 
example in the form of colonies on a solid medium or in the form 
of a liquid culture, as well as frozen or dried preparations of 
the strains . 

In a preferred aspect, the attenuated mycobacterium further 
comprises at least one additional mutation. This may be a 
mutation in a gene responsible for the production of products 
essential to bacterial growth which are absent in a human or 
animal host . For example , mutations to the gene for aspartate 
semi -aldehyde dehydrogenase (asd) have been proposed for the 
production of attenuated strains of Salmonella. The asd gene is 
described further in Gene (1993) 129 ; 123-128. A lesion in the 
asd gene, encoding the enzyme aspartate /3-semialdehyde 
dehydrogenase would render the organism auxotrophic for the 
essential nutrient diaminopelic acid (DAP) , which can be provided 
exogenously during bulk culture of the vaccine strain. Since 
this compound is an essential constituent of the cell wall for 
gram-negative and some gram-positive organisms and is absent from 
mammalian or other vertebrate tissues, mutants would undergo 
lysis after about three rounds of division in such tissues. 
Analogous mutations may be made to the attenuated mycobacteria 
of the invention. 

In addition or in the alternative, the attenuated mycobacteria 
may carry a recA mutation. The recA mutation knocks out 
homologous recombination - the process which is exploited for the 
construction of the mutations. Once the recA mutation has been 
incorporated the strain will be unable to repair the constructed 
deletion mutations. Such a mutation will provide attenuated 
strains in which the possibility of homologous recombination to 



wo 97/23624 



PCT/GB96/03221 



-34- 

with DNA from wild- type strains has been minimized. RecA genes 
have been widely studied in the art and their sequences are 
available. Further modifications may be made for additional 
safety. 

The invention further provides a process for preparing a vaccine 
composition comprising an attenuated bacterium according to the 
invention process comprises (a) inoculating a culture vessel 
containing a nutrient medium suitable for growth of said 
bacterium; (b) culturing said bacterium; (c) recovering said 
bacteria and (d) mixing said bacteria with a pharmaceutical ly 
acceptable diluent or carrier. 

Attenuated bacterial strains according to the invention may be 
constructed using recombinant DNA methodology which is known per 
se. In general, bacterial genes may be mutated by a pirocess of 
targeted homologous recombination in which a DNA construct 
containing a mutated form of the gene is introduced into a host 
bacterium which it is desired to attenuate. The construct will 
recorabine with the wild- type gene carried by the host and thus 
the mutated gene may be incorporated into the host genome to 
provide a bacterium of the present invention which may then be 
isolated. 

The mutated gene may be obtained by introducing deletions into 
the gene, e.g by digesting with a restriction enzyme which cuts 
the coding sequence twice to excise a portion of the gene and 
25 then religating under conditions in which the excised portion is 
not reintroduced into the cut gene. Alternatively frame shift 
mutations may be introduced by cutting with a restriction enzyme 
which leaves overhanging 5' and 3' termini, filling in and/or 
trimming back the overhangs, and religating. Similar mutations 
30 may be made by site directed mutagenesis. These are only 
examples of the types of techniques which will readily be at the 
disposal of those of skill in the art. 

Various assays are available to detect successful recombination. 
In the case of attenuations which mutate a target gene necessary 
35 for the production of an essential metabolite or catabolite 
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compound, selection may be carried out by screening for bacteria 
unable to grow in the absence of such a compound. Bacteria may 
also be screened with antibodies or nucleic acids of the 
invention to determine the absence of production of a mutated 
gene product of the invention or to confirm that the genetic 
lesion introduced - e.g. a deletion - has been incorporeited into 
the genome of the attenuated strain. 

The concentration of the attenuated strain in the vaccine will 
be formulated to allow convenient unit dosage fonriis to be 
prepared. Concentrations of from about 10* to 10^ bacteria per 
ml will generally be suitable, e.g. from about 10^ to 10* such as 
about 10^ per ml. Live attenuated organisms may be administered 
subcutaneous ly or intramuscularly at up to 10* organisms in one 
or more doses, e.g from around 10^ to 10% e.g about 10* or 10'' 
organisms in a single dose. 

The vaccines of the invention may be administered to recipients 
to treat established disease or in order to protect them against 
diseases caused by the corresponding wild type mycobacteria, such 
as inflammatory diseases such as Crohn' s disease or sarcoidosis 
in humans or Johne's disease in animals. The vaccine may be 
administered by any suitable route. In general, subcutaneous or 
intramuscular injection is most convenient, but oral, intranasal 
and colorectal administration may also be used. 

The following Examples illustrates aspects of the invention. 
EXAMPLE 1 

Tests for the presence of the GS identifier sequence were 
performed on 5/il bacterial DNA extracts (25 ixg/ml to 500 fig/ml) 
using polymerase chain reaction based on the oligonucleotide 
primers 5 ' -GATGCCGTGAGGAGGTAAAGCTGC-3 ' (Seq ID No . 40) and 5'- 
GATACGGCTCTTGAATCCTGCACG-3' (Seq ID No. 41) from within the 
identifier DNA sequences (Seq. ID Nos 1 and 2) . PGR was performed 
for 40 cycles in the presence of 1.5 mM magnesium and an 
annealing temperature of 58°C. The presence or absence of the 
correct amplification product indicated the presence or absence 
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of GS identifier sequence in the corresponding bacterium. GS 
identifier sequence is shown to be present in all the laboratory 
and field strains of Mptb and Mavs tested. This includes Mptb 
isolates 0025 (bovine CVL Weybridge) , 0021 (caprine, Moredun) , 
0022 (bovine, Moredun), 0139 (human, Chiodini 1984), 0209, 
0208, 0211, 0210, 0212, 0207, 0204, 0206 (bovine, Whipple 1990). 
All MptJb strains were 1S900 positive. The Mavs strains include 
0010 and 0012 (woodpigeon, Thorel) 0018 (armadillo, Portaels) and 
0034, 0037, 0038, 0040 (AIDS, Hof fner) . All Mavs strains were 
1S902 positive. One pathogenic M. avium strain 0033 (AIDS, 
Hof fner) also contained GS identifier sequence. GS identifier 
sequence is absent from other mycobacteria including other 
M. avium, M.malmoense, M.szulgai, M.gordonae, M.chelonei, 
M.fortuitum, M.phlei, as well as E.coli, S.areus, Nocardia sp. 
Streptococcus sp. Shigella sp. Pseudomonas sp. 



Example 2 ; 

To obtain the full sequence of GS in Mavs and Mptb we generated 
a genomic library of Mavs using the restriction endonuclease 
EcoRI and cloning into the vector pUC18 . This achieved a 
representative library which was screened with ^^P- labelled 
identifier sequence yielding a positive clone containing a 17kbp 
insert. We constructed a restriction map of this insert and 
identified GS as fragments unique to Mavs and Mptb and not 
occurring in laboratory strains of M. avium. These fragments 
were sub-cloned into pUC18 and pGEM4Z. We identified GS 

contained within an 8kb region. The full nucleotide sequence 
was determined for GS on both DNA strands using primer walking 
and automated DNA sequencing. DNA sequence for GS in Mptb was 
obtained using overlapping PGR products generated using PwoDNA 
polymerase, a proofreading thermostable enzyme. The final DNA 
sequences were derived using the University of Wisconsin GCG gel 
assembly software package. 



Examiale 3 : 

The DNA sequence of GS in Mavs and Mptb was found to be more 
than 99% homologous. The ORFs encoded in GS were identified 
using GeneRunner and DNAStar computer programmes. Eight ORFs 
were identified and designated GSA, GSB, GSC, GSD, GSE, GSF, GSG 
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and GSH. Database comparisons were carried out against the 
GenEMBL Database release version 48.0 (9/96) , using the BLAST and 
BLIXEM programmes. GSA and GSB encoded proteins of 13.5kDa and 
30.7kDa respectively, both of unknown functions. GSC encoded 
5 a protein of 3 8.4kDa with a 65% homology to the amino acid 
sequence of rfbD of V. cholerae, a 62% amino acid sequence 
homology to gmd of B.coii and a 58% homology to gca of 
Ps. aeruginosa which are all GDP-D-mannose dehydratases. 
Equivalent gene products in H. influenzae, S .dysenteriae, 
10 y. enterocolitxca, N . gonorrhoea , K . pneumoniae and rfbD in 
Salmonella enterica are all involved in '0' -antigen processing 
known to be linked to pathogenicity. GSD encoded a protein of 
3 7.1kDa which showed 5 8% homology at the DNA level to wcaG from 
E.coli, a gene involved in the synthesis and regulation of 
15 capsular polysaccharides, also related to pathogenicity. GSE 
was found to have a > 30% amino acid homology to rfbT of 
V. cholerae, involved in the transport of specific LPS components 
across the cell membrane. In V. cholerae the gene product causes 
a seroconversion from the Inaba to the Ogawa 'epidemic' strain. 
20 GSF encoded a protein of 30.2kDa which was homologous in the 
range 25-40% at the amino acid level to several glucosyl 
transferases such as rfpA of K. pneumoniae, rfbB of K. pneumoniae, 
IgtD of H. influenzae, Isi of N . gonorrhoae . In E.coli an 

equivalent gene galE adds |8-l-3 N-acetylglucosamine to galactose, 
25 the latter only found in 'O' and 'M' antigens which are also 
related to pathogenicity. GSH comprising the ORFs GSH^ and GSH^ 
encodes a protein totalling about 60kDa which is a putative 
transposase with a 40 - 43% homology at the amino acid level to 
the equivalent gene product of IS2I in E.coli. This family of 
30 insertion sequences is broadly distributed amongst gram negative 
bacteria and is responsible for mobility and transposition of 
genetic elements. An IS21- like element in B.fragilis is split 
either side of the iS-lactamase gene controlling its activation 
and expression. We programmed an E.coli S30 cell-free extract 
35 with plasmid DNA containing the ORF GSH under the control of a 
lac promoter in the presence of a ^^S-methionine, and 
demonstrated the translation of an abundant 60kDa protein. 
The proteins homologous to GS encoded in other organisms are in 
general highly antigenic. Thus the proteins encoded by the ORFs 



wo 97/23624 



PCT/GB96/03221 



-38- 

in GS may be used in immunoassays of antibody or cell mediated 
immuno-reactivity for diagnosing infections caused by 
mycobacteria, particularly Mptb, Mavs and Mtb. Enhancement of 
host immune recognition of GS encoded proteins by vaccination 

5 using naked specific DNA or recombinant GS proteins, may be used 
in the prevention and treatment of infections caused by Mptb, 
Mavs and Mtb in humans and animals . Mutation or deletion of all 
or some of the ORFs A to H in GS may be used to generate 
attenuated strains of Mptb, Mavs or Mtb with lower pathogenicity 

10 for use as living or killed vaccines in humans and animals. Such 
vaccines are particularly relevant to Johne's disease in animals, 
to diseases caused by Mptb in humans such as Crohn's disease, and 
to the management of tuberculosis especially where the disease 
is caused by multiple drug-resistant organisms. 
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SEQUENCE LISTING 

Seq. ID No.l 

5'- 1 GATCCAACTA AACCCGATGG AACCCCGCGC AAACTATTGG ACGTCTCCGC GCTACGCAGT 
61 TGGGHGGCG CCCGCGAATC GCACTGAAAG AGGGCATCGA TGCAACGGTG TCGTGGTACC 
121 GCACAAATGC CGATGCCGTG AGGAGGTAAA GCTGCGGGCC GGCCGATGTT ATCCCTCCGG 
181 CCGGACGGGT AGGGCGACCT GCCATCGAGT GGTACGGCAG TCGCCTGGCC GGCGAGGCGC 
241 ATGGCCTATG TGAGTATCCC ATAGCCTGGC nCGCTCGCC CCTACGCATT ATCAGTTGAC 
301 CGCTTTCGCG CCACGTCGCA GGCTTGCGGC AGCATCCCGT TCAGGTCTCC TCATGGTCCG 
361 GTGTGGCACG ACCACGCAA6 CTCGAACCGA CTCGTTTCCC AATTTCGCAT GCTAATATCG 
421 CTCGATGGAT TTTTTGCGCA ACGCCGGCH GATGGCTCGT AACGTTAGCA CCGAGATGCT 
481 GCGCCACTCC GAACGAAAGC GCCTAHAGT AAACCAAGTC GAAGCATACG GAGTCAACGT 
541 TGTTATTGAT GTCGGTGCTA ACTCCGGCCA GTTCGGTAGC GCTTTGCGTC GTGCAGGAH 
601 CAAGAGCCGT ATCGTTTCCT TTGAACCTCT TTCGGGGCCA TTTGCGCAAC TAACGCGCAA 
661 GTCGGCATCG GATC -3' 



Seq. ID No. 2 

5'- 1 GATCCGATGC CGACTTGCGC GTTAGTTGCG CAAATGGCCC CGAAAGAGGT TCAAAGGAAA 
61 CGATACGGCT CTTGAATCCT GCACGACGCA AAGCGCTACC GAACTGGCCG GAGTTAGCAC 
121 CGACATCAAT AACAACGTTG ACTCCGTATG CTTCGACTTG GTTTACTAAT AGGCGCTTTC 
181 GTTCGGAGTG GCGCAGCATC TCGGTGCTAA CGTTACGAGC CATCAAGCCG GCGTTGCGCA 
241 AAAAATCCAT CGAGCGATAT TAGCATGCGA AATTGGGAAA CGAGTCGGTT CGAGCTTGCG 
301 TGGTCGTGCC ACACCGGACC ATGAGGAGAC CTGAACGGGA TGCTGCCGC& AGCCTGCGAC 
361 GT6GCGCGAA AGCGGTCAAC TGATAATGCG TAGGGGCGAG CCAAGCCAGG CTATGGGATA 
421 CTCACATAGG CCATGCGCCT CGCCGGCCAG GCGACTGCCG TACCACTCGA TGGCAGGTCG 
481 CCCTACCCGT CCGGCCGGAG GGATAACATC GGCCGGCCCG CAGCTTTACC TCCTCACGGC 
541 ATCGGCATTT GTGCGGTACC ACGACACCGT TGCATCGATG CCCTCTTTCA GTGCGATTCG 
601 CGGGCGCCAA CCCAACTGCG TAGCGCGGAG ACGTCCAATA GTTTGCGCGG GGTTCCATCG 
661 GGTTTAGTTG GATC -3' 
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Seq. ID No. 3 



10 



15 



20 



25 



30 



35 



40 



45 



50 



1 GAATTCTGGG TTGGAGACGA CGTCGAACTC CTGGTCGGTC TTGCTTCX3AA 
51 TGATCGCTGT GATCTGGTCG GCGGTGCCGA CAGGAACCGT CGACTTGTCG 
101 ACGATCACCT TGTACCGGTC GATGTATGAC CCAATGTCGT CCGCAACCGA 
151 GAAGACGTAC GTCAGGTCCG CCGCCCCGCT TTCACCCATG GGCGTCGGGA 
201 CGGCGATGAA AATGACGTCC GCX5TGCTCGA TTCCGCGTTG CCGGTCGGTG 
251 GTGAAGTCAA TCAGCCCGTT CTCACGGTTC CTCGCAATCA ACTCCCAACC 
3 01 CGGGCTCGAA AATCGGGACA CTGCCTGCGA GGAGCAAATC GATCTTGGCC 
351 TGATCGATAT CGACACAGAC GACATCGTTG CCGCTATCCG CGAGACAGGC 
401 GCCCGTGACG AGGCCTACAT AGCCTGATCC GACCACCGAA ATTTTCAAGA 
451 TGACCCCTTC AAGTCCCCGA TCGGTCGACG ACCATACTGC CGCAACTCTG 
501 TACCCTCCGT GGGTAATTCG CATGTCGCGT TCGTAAGGAG CAGCCAGCGA 
551 GTCGGGGAC6 TTCGGTGAGA GAGTCGCAGG ACTAC6AGGT TGCCGGTGCG 
601 ATACATCACA GTGTTGCGTC TGTCGGCAAC GATGCAGCAA GAACCCACGG 
651 GGCAGCCCTG AACTGCGCGC ATGACCGGTC CTTGTCCTGG CACCTTTGAT 
701 CGGCCACCGC TTCCATGCGA ACATGACCGG AATCCATAGC GCGTGGTCAA 
751 GCAGCGGGGA GGTAGACGTC GGTGTCATCT GCTCCAACCG TGTCGGTGAT 
801 AACGATTTCG CTGAACGATC TCGAGGGATT GAAAAGCACC GTGGAGAGCG 
851 TTCGCGCGCA GCGCTATGGG GGGCGAATCG AGCACATCGT CATCGACGGT 
901 GGATCGGGCG ACGCCGTCGT GGAGTATCTG TCCGGCGATC CTGGCTTTGC 
951 ATATTGGCAA TCTCAGCCCG ACAACGGGAG ATATGACGCG ATGAATCAGG 
1001 GCATTGCCCA TTCGTCGGGC GACCTGTTGT GGTTTATGCA CTCCACGGAT 
1051 CGTTTCTCCX3 ATCCAGATGC AGTCGCTTCC GTGGTGGAGG CGCTCTCGGG 
1101 GCATGGACCA GTACGTGATT TGTGGGGTTA CGGGAAAAAC AACCTTGTCG 
1151 GACTCGACGG CAAACCACTT TTCCCTCGGC CGTACGGCTA TATGCCGTTT 
1201 AAGATGCGGA AATTTCTGCT CGGCGCGACG GTTGCGCATC AGGCGACATT 
1251 CTTCGGCGCG TCGCTGGTAG CCAAGTTGGG CGGTTACGAT CTTGATTTTG 
1301 GACTCGAGGC GGACCAGCTG TTCATCTACC GTGCCGCACT AATACGGCCT 
1351 CCCGTCACGA TCGACCGCGT GGTTTGCGAC TTCGATGTCA CGGGACCTGG 
1401 TTCSiACCCAG CCCATCCGTG AGCACTATCG GACCCTGCGG CGGCTCTGGG 
1451 ACCTGCATGG CGACTACCCG CTGGGTGGGC GCAGAGTGTC GTGGGCTTAC 
1501 TTGCGTGTGA AGGAGTACTT GATTCGGGCC GACCTGGCCG CATTCAACGC 
1551 GGTAAAGTTC TTGCGAGCGA AGTTCGCCAG AGCTTCGCGG AAGCAAAATT 
1601 CATAGAAACC AACTTCTACT GCCTGACCTG AGCAGCGCCG AGGCGCGCAG 
1651 CGCGATCAGT GCGACCTGAA CGGCCAGGTG GAAAGCGCCA CCGATCCCGG 
1701 CACCGAGTGC CTGACGCTTC GGATCCCTTG CACCACAACG AGAGTGAGAG 
1751 CGCCATGATG AGGAAATATC GGCTGGGCGG AGTCAACGCC GGAGTGACAA 
IB 01 AAGTGAGAAC CCGGTGAAGC GAGCGCTTAT AACAGGGATC ACGGGGCAGG 
1B51 ATGGTTCCTA CCTCGCCGAG CTACTACTGA GCAAGGGATA CGAGGTTCAC 
1901 GGGCTCGTTC GTCGAGCTTC GACGTTTAAC ACGTCGCGGA TCGATCACCT 
1951 CTACGTTGAC CCACACCAAC CGGGCGCGCG CTTGTTCTTG CACTATGCAG 
2001 ACCTCACTGA CGGCACCCGG TTGGTGACCC TGCTCAGCAG TATCGACCCG 
2051 GATGAGGTCT ACAACCTCGC AGCGCAGTCC CATGTGCGCG TCAGCTTTGA 
2101 CGAGCCAGTG CATACCGGAG ACACCACCGG CATGGGATCG ATCCGACTTC 
2151 TGGAAGCAGT CCGCCTTTCT CGGGTGGACT GCCGGTTCTA TCAGGCTTCC 
2201 TCGTCGGAGA TGTTCGGCGC ATCTCCGCCA CCGCAGAACG AATCGACGCC 
2251 GTTCTATCCC CGTTCGCCAT ACGGCGCGGC CAAG6TCTTC TCGTACTGGA 
23 01 CGACTCGCAA CTATCGAGAG GCGTACGGAT TATTCGCAGT GAATGGCATC 
2351 TTGTTCAACC ATGAGTCCCC CCGGCGCGGC GAGACTTTCG TGACCCGAAA 
2401 GATCACGCGT GCCGTGGCGC GCATCCGAGC TGGCGTCCAA TCGGAGGTCT 
2451 ATATGGGCAA CCTCGATGCG ATCCGCGACT GGGGCTACGC GCCCGAATAT 
2501 GTCGAGGGGA TGTGGAGGAT GTTGCAAGCG CCTGAACCTG ATGACTACGT 
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2551 CCTGGCGACA GGGCGTGGTT ACACCGTACG TGAGTTCGCT CAAGCTGCTT 
2601 TTGACCATGT CGGGCTCGAC TGGCAAAAGC GCGTCAAGTT TGACGACCGC 
2651 TATTTGCGTC CCACCGAGGT CGATTCGCTA GTAGGAGATG CCGACAAGGC 
2701 GGCCCAGTCA CTCGGCTGGA AAGCTTCGGT TCATACTGGT GAACTCGCGC 
2751 GCATCATGGT GGACGCGGAC ATCGCCGCGT TGGAGTGCGA TGGCACACCA 
2801 TGGATCGACA CGCCGATGTT GCCTGGTTGG GGCAGAGTAA GTTGACGACT 
2851 ACACCTGGGC CTCTGGACCG CGCAACGCCC GTGTATATCG CCGGTCATCG 
2901 GGGGCTGGTC GGCTCAGCGC TCGTACGTAG ATTTGAGGCC GAGGGGTTCA 
2951 CCAATCTCAT TGTGCGATCA CGCGATGAGA TTGATCTGAC GGACCGAGCC 
3001 GCAACGTTTG ATTTTGTGTC TGAGACAAGA CCACAGGTGA TCATCGATGC 
3051 GGCCGCACGG GTCGGCGGCA TCATGGCGAA TAACACCTAT CCCGCGGACT 
3101 TCTT6TCCX3A AAACCTCCGA ATCCAGACCA ATTTGCTCGA CGCAGCTGTC 
3151 GCCGTGCGTG TGCCGCGGCT CCTTTTCCTC GGTTCGTCAT GCATCTACCC 
3201 GAAGTACGCT CCGCAACCTA TCCACGAGAG TGCTTTATTG ACTGGCCCTT 
3251 TGGAGCCCAC CAACGACGCG TATGCGATCG CCAAGATCGC CGGTATCCTG 
3301 CAAGTTCAGG CGGTTAGGCG CCAATATGGG CTGGCGTGGA TCTCTGCGAT 
3351 GCCGACTAAC CTCTACGGAC CCGGCGACAA CTTCTCCCCG TCCGGGTCGC 
3401 ATCTCTTGCC GGCGCTCATC CGTCGATATG AGGAAGCCRA AGCTGGTGGT 
3451 GCAGAAGAGG TGACGAATTG GGGGACCGGT ACTCCGCGGC GCGAACTTCT 
3501 GCATGTCGAC GATCTGGCGA GCGCATGCCT GTTCCTTTTG GAACATTTCG 
3551 ATGGTCCGAA CCACGTCAAC GTGGGCACCG GCGTCGATCA CAGCATTAGC 
3601 GAGATCGCAG ACATGGTCGC TACAGCGGTG GGCTACATCG GCGAAACACG 
3651 TTGGGATCCA ACTAAACCCG ATGGAACCCC GOSCAAACTA TTGGACGTCT 
3701 CCGCGCTACG CGAGTTGGGT TGGCGCCCGC GAATCGCACT GAAAGACGGC 
3751 ATCGATGCAA CGGTGTCGTG GTACCGCACA AATGCCGATG CCGTGAGGAG 
3801 GTAAAGCTGC GGGTCGGCCG ATGTTATCCC TCCGGCCGGA CGGGTGGG6C 
3851 GACCTGCCGT CGAGTGGTAC GGCAGTCGCC TGGCCGGCGA GGCGCGTGGC 
3901 CTATGGGAGT ATCCAATAGC CTGGCTTGGC TCGCCCCTAC GCATTATCAG 
3951 TTGACCGCTT TCGCX3CCAGC TCGCAGGCTT GCGGCAGCAT CCCGTTCAGG 
4001 TCTCCTCATG GTCCGGTGTG GCACGACCAC GCAAGCTCGA ACCGACTCGT 
4051 TTCCCAATTT CGCATGCTAA TATCGCTCGA TGGATTTTTT GCGCAACGCC 
4101 GGCTTGATGG CTCGTAACGT TAGTACCGAG ATGCTGCGCC ACTTCGAACG 
4151 AAAGCGCCTA TTAGTAAACC AATTCAAAGC ATACGGAGTC AACGTTGTTA 
4201 TTGATGTCGG TGCTAACTCC GGCCAGTTCG GTAGCGCTTT GCGTCGTGCA 
4251 GGATTCAAGA GCCGTATCGT TTCCTTTGAA CCTCTTTCGG GGCCATTTGC 
4301 GCAACTAACG CGCAAGTCGG CATCGGATCC ACTATGGGAG TGTCACCAGT 
4351 ATGCCCTAGG CGACGCCGAT GAGACGATTA CCATCAATGT GGCAGGCAAT 
4401 GCGGGGGCAA GTAGTTCCGT GCTGCCGATG CTTAAAAGTC ATCAAGATGC 
4451 CTTTCCTCCC GCGAATTATA TTGGCACCGA AGACGTTGCA ATACACCGCC 
4501 TTGATTCGGT TGCATCAGAA TTTCTGAACC CTACCGATGT TACTTTCCTG 
4551 AAGATCGACG TACAGGGTTT CGAGAAGCAG GTTATCACGG GCAGTAAGTC 
4601 AACGCTTAAC GAAAGCTGCG TCGGCATGCA ACTCGAACTT TCTTTTATTC 
4651 CGTTGTAC6A AGGTGACATG CTGATTCATG AAGCGCTTGA ACTTGTCTAT 
4701 TCCCTAGGTT TCAGACTGAC GGGTTTGTTG CCCGGCTTTA CGGATCCGCG 
4751 CAATGGTCGA ATGCTTCAAG CTGACGGCAT TTTCTTCCGT GGGGACGATT 
4801 GACATAAATG CTCCGTCGGC ACCCTGCCGG TATCCAAACG GGCGATCTGG 
4851 TGAGCCGGCC TCCCGGGCAC CTAATCGACT ATCTAAATTG AGGCGGCCGC 
4901 GACGTGCGGC ACGAACAGGT GGCCGGCTGC TAGCGTTACA CACGTCATGA 
4951 CTGCGCCAGT GTTCTCGATA ATTATCCCTA CCTTCAATGC AGCGGTGACG 
5001 CTGCAAGCCT GCCTCGGAAG CATCGTCGGG CAGACCTACC GGGAAGTGGA 
5051 AGTGGTCCTT GTCGACGGCG GTTCGACCGR TCGGACCCTC GACATCGCGA 
5101 ACAGTTTCCG CCCGGAACTC GGCTCGCGAC TGGTCGTTCA CAGCGGGCCC 
5151 GATGATGGCC CCTACGACGC CATGAACCGC GGCGTCGGCG TGGCCACAGG 
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5201 CGAATGGGTA CTTTTTTTAG GCGCCGACGA CACCCTCTAC GAACCAACCA 
5251 CGTTGGCCCA GGTAGCCGCT TTTCTCGGCG ACCATGCGGC AAGCCATCTT 
5301 GTCTATGGCG ATGTTGTGAT GCGTTCGACG AAAAGCCGGC ATGCCGGACC 
5351 TTTCGACCTC GACCGCCTCC TATTTGAGAC GAATTTGTGC CACCAATCGA 
5 5401 TCTTTTACCG CCGTGAGCTT TTCGACGGCA TC6GCCCTTA CAACCTGCGC 

5451 TACCX3AGTCT GGGCGGACTG GGACTTCAAT ATTCGCTGCT TCTCCAACCC 
5501 GGCGCTGATT ACCCGCTACA TGGACGTCGT GATTTCCGAA TACAACGACA 
5551 TGACCGGCTT CAGCATGAGG CAGGGGACTG ATAAAGAGTT CAGAAAACGG 
5601 CTGCCAATGT ACTTCTGGGT TGCAGGGTGG GAGACTTGCA GGCGCATGCT 
iO 5651 GGCGTTTTTG AAAGACAAGG AGAATCGCCG TCTGGCCTTG CGTACGCGGT 

5701 TGATAAGGGT TAAGGCCGTC TCCAAAGAAC GAAGCGCAGA ACCGTAGTCG 
5751 CGGATCCACA TTGGACTTCT TTAACGCGTT TGC6TCCTGA TCCACCTTTC 
5801 AAGCCCGTTC CGCGTAACGC GGCGCGCAGA GAGTGGTCGC ATATCGCATC 
5851 ACTGTTCTCG TGCCAGTGCT TGGAAAGCGT CGAGCACTCT GGTTCGCGTT 
i5 5901 CTTGACGTTC GCGCCCGCTC CTAGAGGTAG CGTGTCACGT GACTGAAGCC 

5951 AATGAGTGCA ACTCGGCGTC GCGAAAGGTT TCAGTCGCGG TTGAGCAAGA 
6001 CACCGCAAGA CTACTGGAGT GCGTGCACAA GCGCCTCCAG CTCGCGGCTG 
6051 AAAGCGGATG CAAAGGGATT CGAAGCTTGA GCAACATGCG AAGGGGAGAA 
6101 CGGCCTATGA GGCTGGGACA GGTTTTCGAT CCGCGCGCGA ATGCACTGTC 
20 6151 AATGGCCAAG TAGAAGTCCC CGCTGGTGGC CAGCAGAAGT CCCCACTCCG 

6201 CTGCGGGTGG TTGGCTAATT CTTGGCGGCT CCCTTCTTGT GGTCGGCGTG 
6251 GCGCATCCGG TAGGACTCGC CGGAGGTGAC GACGATGCTG GCGTGGTGCA 
6301 GCAGCCGATC GAGGATGCTG GCGGCGGTGG TGTGCTCGGG CAGGAATCGC 
6351 CCCCATTGTT CGAAGGGCCA ATGCGAGGCG ATGGCCAGGG AGCG6CGCTC 
25 6401 GTAGCCGGCA GCCACGAGCC GGAACAACAG TTGAGTCCCG GTGTCGTCGA 

6451 GCGGGGCGAA GCCGATCTCG TCCAAGATGA CCAGATCCGC GCGGAGCAGG 
5501 GTGTCGATGA TCTTGCCGAC GGTGTTGTCG GCCAGGCCGC GGTAGAGGAC 
S551 CTCGATCAGG TCGGCGGCGG TGAAGTAGCG GACTTTGAAT CCGGCGTGGA 
6601 CGGCAGCGTG CCCGCAGCCG ATGAGCAGGT GACTTTTGCC CGTACCAGGT 
30 6651 GGGCCAATGA CCGCCAGGTT CTGTTGTGCC CGAATCCATT CCAGGCTCGA 

6701 CAGGTAGTCG AACGTGGCTG OSGTGATCGA CGATCCGGTG ACGTCGAACC 
6751 CGTCGAGGGT CTTGGTGACC GGGAAGGCTG CGGCCTTGAG ACGGTTGGCG 
6801 GTGTTGGAGG CATCGCGGGC AGCGATCTCG GCCTCAACCA ACGTCCGCAG 
6851 GATCTCCTCC GGTGTCCAGC GTTGCGTCTT GGCGACTTGC AACACCTCGG 
35 6901 CGGCGTTGCG GCGCACCGTG GCCAGCTTCA ACCGCCGCAG CGCCGCGTCA 

6951 AGGTCAGCAG CCAGCGGTGC CGCCGAGGAC GGTGCCACCG GCTTGGCAGC 
7001 GGTGGTCATG AGGCCGTCCC GTCGGTGGTG TTGATCTTGT AGGCCTCCAA 
7051 CGAGCGGGTC TCGACGGTGG GCAGATCGAG CACGAGTGCG TCGGCGGCGG 
7101 GGCGGGGTTG TGGGGTGCCG GCGCCGGCGG CCAGGATCGA GCGCACGTCG 
40 7151 GCAGCGCGGA ACCGGCGAAA CGCAACCGCC CGGCGCAGCG CGTCAATCAA 

7201 AGCCTGTTCG CCGTGGGCGG CGCCAAGGCC GAGCAGAATG TCGAGTTCGG 
7251 ATTTCAGTCG GGTGTTGCCG ATCGCAGCAG CACCGACGAG GAACTGCTGC 
7301 GCTTCGGTTC CCAATGCGCA GAATCGTTTC TCTGCTTGGG TTTTCGGGCG 
73 51 AGGACCACGC GAGGGTGCGG GTCTGG6TCC GTCGTAGTGT TCATCGAGGA 
45 7401 TGGACACCTC ACCTGGGCTG ACGAGCTCGT GCTCGGCCAC GATCACACCG 

7451 GTCGCAGGTT CCAACAGGAT CAGGGCGCCA TGATCGACCA CCACCGCCAC 
7501 GGTGGCACCG ACGAGCCGCT GAGGCACCGA GTAACGAGCT GAGCCGTAAC 
7551 GGATGCACGA GAGGCCGTCG ACCTTACGGC GCACCGACCC CGAGCCGATC 
7601 GTCGGCCGCA GCGAGGGCAG CTCCCTCAAG ACGGTGCGCT CGTCAACCAA 
50 7651 GCGATCGTTG GGCACGGCGC AGATCTCCGA GTGGACCGTG GCATTGACCT 

7701 CGGCGCACCA TAGTTGCGCC TGGGCGTTGA GGGCACGTAG GTCGACCTGC 
7751 TCACCGGCTA ACGCAGCTTC GGTCAGCAGC GGCACCGCAA GGTCGTCCTG 
7801 AGCGTAGCCA CAGAGGTTCT CCACGATGCC CTTCGATTGC GGATCCGCAC 
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7851 CGTGGCAC3AA GTCCGGAACG AAGCCATAGT GGGACGCGAA TCGCACATAA 
7901 TCCGGTGTTG GAACAACAAC ATTGGCGACG ACACCACCTT TGAGGCAGCC 
7951 CATCCGGTCG GCCAGGATCT TGGCCGGAAC CCCACCGATC GCCTC 



Seq. ID No. 4 

1 TTCTACTGCC TGACCTGAGC AGCGCCGAGG CGCGCAGCGC GATCACTGCG ACCTGAATGG 
61 CCAGGTGGAA AGCGCCACCG ATCCCGGCAC CGAGTGCCTG ACGATTCGGA TCCCTTGCAC 
121 CACAACGAGA GTGAGACCGC CATGATGACG AAATATCGGC TGGGCGGAGT CAACGCCGGA 
181 GTGACAAAAG TGAGAACCCG GTGAAGCGAG CGCTTATAAC AGGGATCACG GGGCAGGATG 
241 GTTCCTACCT CGCCGAGCTA CTACTGAGCA AGGGATACGA GGTTCACGGG CTCGTTCGTC 
301 GAGCTTCGAC GTTTAACACG TCGCGGATCG ATCACCTCTA CGTTGACCCA CACCAJiCCGG 
361 GCGCGCGCTT GTTCTTGCAC TATGCAGACC TCACTGACGG CACCCGGTTG GTGACCCTGC 
421 TCAGCAGTAT CGACCCGGAT GAGGTCTACA ACCTCGCAGC GCAGTCCCAT GTGCGCGTCA 
481 GCTTTGACGA GCCAGTGCAT ACCGGAGACA CCACCGGCAT GGGATCGATC CGACrrCTGG 
541 AAGCAGTCCG CCTTTCTCGG GTGGACTGCC GGTTCTATCA GGCTTCCTCG TCGGAGATGT 
601 TCGGCGCATC TCCGCCACCG CAGAACGAAT CGACGCCGTT CTATCCCCGT TCGCCATACG 
661 GCGCGGCCAA GGTCTTCTCG TACTGGACGA CTCGCAACTA TCGAGAGGCG TACGGATTAT 
721 TCGCAGTGAA TGGCATCTTG TTCAACCATG AGTCCCCCCG GCGCGGCGAG ACTTTCGTGA 
781 CCCGAAAGAT CACGCGTGCC GTGGCGCGCA TCCGAGCTGG CGTCCAATCG GAGGTCTATA 
841 TGGGCAACCT CGATGCGATC CGCGACTGGG GCTACGCGCC CGAATATGTC GAGGGGAT6T 
901 GGAGGATGTT GCAAGCGCCT GAACCTGATG ACTACGTCCT GGCGACAGGG CGTGGTTACA 
961 CCGTACGTGA GTTCGCTCAA GCTGCTTTTG ACCACGTCGG GCTCGACTGG CAAAAGCACG 
1021 TCAAGTTTGA CGACCGCTAT TTGCGCCCCA CCGAGGTCGA TTCGCTAGTA GGAGATGCCG 
1081 ACAGGGCGGC CCAGTCACTC GGCTGGAAAG CTTCGGTTCA TACTGGTGAA CTCGCGCGCA 
1141 TCATGGTGGA CGCGGACATC GCCGCGTCGG AGTGCGATGG CACACCATGG ATCGS.CACGC 
1201 CGATGTTGCC TGGTTGGGGC GGAGTAAGTT GACGACTACA CCTGGGCCTC TGGACCGCGC 
1261 AACGCCCGTG TATATCGCCG GTCATCGGGG GCTGGTCGGC TCAGCGCTCG TACGTAGATT 
1321 TGAGGCCGAG GGGTTCACCA ATCTCATTGT GCGATCACGC GATGAGATTG ATCTGACGGA 
1381 CCGAGCCGCA ACGTTTGATT TTGTGTCTGA GACAAGACCA CAGGTGATCA TCGATGCGGC 
1441 CGCACGGGTC GGCGGCATCA TGGCGAATAA CACCTATCCC GCGGACTTCT TGTCCGAAAA 
1501 CCTCCGAATC CAGACCAATT TGCTCGACGC AGCTGTCGCC GTGCGTGTGC CGCGGCTCCT 
1561 TTTCCTCGGT TCGTCATGCA TCTACCCGAA GTACGCTCCG CAACCTATCC ACGAGAGTGC 
1621 TTTATTGACT GGCCCTTTGG AGCCCACCAA CGACGCGTAT GCGATCGCCA AGATCGCCGG 
1681 TATCCTGCAA GTTCAGGCGG TTAGGCGCCA ATATGGGCTG GCGTGGATCT CTGCGATGCC 
1741 GACTAACCTC TACGGACCCG GCGACAACTT CTCCCCGTCC GGGTCGCATC TCTTGCCGGC 
1801 GCTCATCCGT CGATATGAGG AAGCCAAAGC TGGTGGTGCA GAA6AGGTGA CGAATTGGGG 
1861 GACCGGTACT CCGCGGCGCG AACTTCTGCA TGTCGACGAT CTGGCGAGCG CATGCCTGTT 
1921 CCTTTTGGAA CATTTCGATG GTCCGAACCA CGTCAACGTG GGCACCGGCG TCGATCACAG 
1981 CATTAGCGAG ATCGCAGACA TGGTCGCTAC GGCGGTGGGC TACATCGGCG AAACACGTTG 
2041 GGATCCAACT AAACCCGATG GAACCCCGCG CAAACTATTG GACGTCTCCG CGCTACGCGA 
2101 GTTGGGTTGG CGCCCGCGAA TCGCAGTGAA AGACGGCATC GATGCAACGG TGTCQTGGTA 
2161 CCGCACAAAT GCCGATGCCG TGAGGAGGTA AAGCTGCGGG CCGGCCGATG TTATCCCTCC 
2221 GGCCGGACGG GTAGGGCGAC CTGCCATCGA GTGGTACGGC AGTCGCCTGG CCGGCGAGGC 
2281 GCATGGCCTA TGGGAGTATC CCATAGCCTG GCTTGGCTCG CCCCTACGCA TTATCAGTTG 
2341 ACCGCTTTCG CGCCAGCTCG CAGGCTCGCG GCAGCATCCC GTTCAGGTCT CCTCATGGTC 
2401 CGGTGTGGCA CGACCACGCA AGCTCGAACC GACTCGTTTC CCAATTTCGC ATGCTAATAT 
2461 CGCTCGATGG ATTTTTTGCG CAACGCCGGC TTGATGGCTC GTAACGTTAG CACCGAGATG 
2521 CTGCGCCRCT TCGAACGAAA GCGCCTATTA GTAAACCAAT TCAAAGCATA CGGAGTCAAC 
2581 GTTGTTATTG ATGTCGGTGC TAACTCCGGC CAGTTCGGTA GCGCTTTGCG TCGTGCAGGA 
2641 TTCAAGAGCC GTATCGTTTC CTTTGAACCT CTTTCGGGGC CATTTGCGCA ACTAACGCGC 
2701 GAGTCGGCAT CGGATCCACT ATGGGAGTGT CACCAGTATG CCCTAGGCGA CGCCGATGAG 
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2761 ACGATTACCA TCAATGTGGC AGGCAATGCG GGGGCAAGTA GTTCCGTGCT GCCGATGCTT 
2821 AAAAGTCATC AAGATGCCTT TCCTCCCGCG AATTATATTG GCACCGAAGA CGTTGCAATA 
2881 CACCGCCTTG ATTCGGTTGC ATCAGAATTT CTGAACCCTA CCGATGTTAC TTTCCl^GAAG 
2941 ATCX3ACGTAC AGGGTTTCGA GAAGCAGGTT ATCGCGGGCA GTAAGTCAAC GCTTAACGAA 
5 3001 AGCTGCGTCG GCATGCAACT CGAACTTTCT TTTATTCCGT TGTACGAAGG TGACATGCTG 

3 061 ATTCATGAAG CGCTTGAACT TGTCTATTCC CTAG6TTTCA GACTGACGGG TTTGTTGCCC 
3121 GGATTTACGG ATCCGCGCAA TGGTCGAATG CTTCAAGCTG ACGGCATTTT CTTCCGTGGG 
3181 GACGATTGAC ATAAATGCTT GCGTCGGCAC CCTGCCGGTA TCCAAACGGG CGATCTGGTG 
3241 AGCCGGCCTC CCGGGCACCT AATCGACTAT CTAAATTGAG GCGGCCGCGA CGTGC(3GCAC 
10 33 01 GAACAGGTGG CCGGCTGCTA GCGTTACACA CGTCATGACT GCGCCAGTGT TCTCGATAAT 

3351 TATCCCTACC TTCAATGCAG CGGTGACGCT GCAAGCCTGC CTCGGAAGCA TCGTC(3GGCA 
3421 GACCTACCGG GAAGTGGAAG TGGTCCTTGT CGACGGCGGT TCGACCGATC GGACCCTCGA 
3481 CATCGCGAAC AGTTTCCGCC CGGAACTCGG CTCGCGACTG GTCGTTCACA GCGGGCCCGA 
3541 TGATGGCCCC TACGACGCCA TGAACCGCGG CGTCGGCGTA GCCACAGGCG AATG03TACT 
15 3 SOI TTT T TTAGGC GCCGACGACA CCCTCTACGA ACCAACCACG TTGGCCCAGG TAGCCGCTTT 

3661 TCTCGGCGAC CATGCGGCAA GCCATCTTGT CTATGGCGAT GTTGTGATGC GTTCGACGAA 
3721 AAGCCGGCAT GCCGGACCTT TCGACCTCGA CCGCCTCCTA TTTGAGACGA ATTTGTGCCA 
3781 CCAATCGATC TTTTACCGCC GTGAGCTTTT CGACGGCATC GGCCCTTACA ACCTGCGCTA 
3841 CCGAGTCTGG GCGGACTGGG ACTTCAATAT TCGCTGCTTC TCCAACCCGG CGCTGATTAC 
20 3901 CCGCTACATG GACGTCGTGA TTTCCGAATA CAACGACATG ACCGGCTTCA GCATGAGGCA 

3961 GGGGACTGAT AAAGAGTTCA GAAAACGGCT GCCAATGTAC TTCTGGGTTG CAGGGTGGGA 
4021 GACTTGCAGG CGCATGCTGG CGTTTTTGAA AGACAAGGAG AATCGCCGTC TGGCCTTGCG 
4081 TACGCGGTTG ATAAGGGTTA AGGCCGTCTC CAAAGAACGA AGCGCAGAAC CGTAGTCGCG 
4141 GATCCACATT GGACTTCTTT AACGCGTTTG CGTCCTGATC CACCTTTCAA CCCCGTTCCG 
25 4201 CGTGACGCGG CGCGCAGAGA GTGGTCGCAT ATCGCGTCAC TGTTCTCGTG CCAGTGCTTG 

4261 GAAAGCGTCG AGCACTCTGG TTCGCGTTCT TGACGTTCGC GCCCGCCCCT AGAGGTAGCG 
4321 TGTCACGTGA CTGAAGCCAA TGAGTGCAAC TCGGCGTCGC GAAAGGTTTC AGTCGCGGTT 
4381 GAGCAAGACA CCGCAAGACT ACTGGAGTGC GTGCACAAGC GCCTCCAGCT CACGG! 



Seq. ID No. 5 

30 1 atgatcgctg tgatctggtc ggcggtgccg acaggaaccg tcgacttgtc gacgatcacc 

61 ttgtaccggt cgatgtatga cccaatgtcg tccgcaaccg agaagacgta cgtcaggtcc 
121 gccgccccgc tttcacccat gggcgtcggg acggcgatga aaatgacgtc cgcgtgctcg 
181 attccgcgtt gccggtcggt ggtgaagtca atcagcccgt tctcacggtt cctogcaatc 
241 aactcccaac ccgggctcga aaatcgggac actgcctgcg aggagcaaat cgatcttggc 

35 301 ctgatcgata tcgacacaga cgacatcgtt gccgotatcc gcgagacagg cgcccgtgac 

361 gaggcctaca tagcctga 



Seq. ID No. 6 
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MIAVIWSAVPTGTVDLSTITLYRSMYDPMS 
SATEKTYVRSAAPLSPMGVGTAMKMTSACS 
IPRCSSVVKSISPFSRFLAINSQPGLENRD 
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Seq. ID No. 7 

1 gtgtcatctg ctccaaccgt gtcggtgata acgatttcgc tgaacgatct cgagggattg 
SI aaaagcaccg tggagagcgt tcgcgcgcag cgctatgggg ggcgaatcga gcacatcgtc 
121 atcgacggtg gatcgggcga cgccgtcgtg gagtatctgt ccggcgatcc tggctttgca 
5 181 tattggcaat ctcagcccga caacgggaga tatgacgcga tgaatcaggg cattgcccat 

241 tcgtcgggcg acctgttgtg gtttatgcac tccacggatc gtttctccga tccagatgca 
301 gtcgcttccg tggtggaggc gctctcgggg catggaccag tacgtgattt gtggggttac 
361 gggaaaaaca accttgtcgg actcgacggc aaaccacttt tccctcggcc gcacggctat 
421 atgccgttta agatgcggaa atttctgctc ggogogacgg ttgcgcatca ggcgacattc 
10 481 ttcggcgcgt cgctggtagc caagttgggc ggttacgatc ttgattttgg aatogaggcg 

541 gaccagctgt tcatctaccg tgocgcacca atacggcctc ccgtcacgat cgaccgcgtg 
601 gtttgcgact tcgatgtcac gggacctggt tcaacccagc ccatccgtga gcactatcgg 
661 accctgcggc ggctctggga cctgcatggc gactacccgc tgggtgggcg cagagcgccg 
721 tgggcttact tgcgtgtgaa ggagtacttg attcgggccg acctggccgc attcaaogcg 
15 781 gtaaagttct tgcgagcgaa gttcgccaga gcttcgcgga agcaaaattc atag 
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Seq. ID No. 9 

1 gtgaagcgag cgcttataac agggatcacg gggcaggatg gttcctacct cgccgagcta 
61 ctactgagca agggatacga ggttcacggg ctcgttcgtc gagcttcgac gtttaacacg 

30 121 tcgcggatcg atcacctcta cgttgaccca caccaaccgg gcgcgogctt gttcUtgcac 

181 tatgcagacc tcactgacgg cacccggttg gtgaccctgc tcagcagtat cgacccggat 
241 gaggtctaca acctcgcagc gcagtcocat gtgcgcgtca gctttgacga gccagtgcat 
3 01 accggagaca ccaccggcat gggatcgatc cgacttctgg aagcagtocg cctttctcgg 
361 gtggactgcc ggttctatca ggcttcctcg tcggagatgt tcggcgcatc tccgccaccg 

35 421 cagaacgaat cgacgccgtt ccatccccgt tcgccatacg gcgcggccaa ggtcttctcg 

481 tactggacga ctogcaacta tcgagaggcg tacggattat tcgcagtgaa tggcatcttg 
541 ttcaaccatg agtccccccg gcgcggcgag actttcgtga cccgaaagat cacgcgtgcc 
601 gtggcgcgca tccgagctgg cgtccaatog gaggtctata tgggoaacct cgatgcgacc 
661 cgcgactggg gctacgcgcc cgaatatgtc gaggggatgt ggaggatgtt gcaagcgcct 

40 721 gaacctgatg actacgtcct ggcgacaggg cgtggttaca ccgtacgtga gttcgctcaa 

781 gctgcttttg accatgtcgg gctcgaccgg caaaagcgcg tcaagtttga cgaccgctat 
841 ttgcgtccca ccgaggtcga ttcgctagta ggagatgccg acaaggcggc ccagtcactc 
901 ggctggaaag cttcggttca tactggtgaa ctcgcgcgca tcatggtgga cgcggacatc 
961 gccgcgttgg agtgcgatgg cacaccatgg atcgacacgc cgatgttgcc tggttggggc 
45 1021 agagtaagtt ga 
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Seq. ID No. 10 

1 VKRALITGITGQDGSYLAELLLSKGYEVHG 

31 LVRRASTFKTSRIDHLYVDPHQPGARLFLH 

61 YADLTDGTRLVTLLSSIDPDEVYNLAAQSH 

91 VRVSFDEPVHTGDTTGMGSIRLLEAVRLSR 

121 VDCRFYQASSSEMFGASPPPQNESTPFYPR 

151 SPYGAAKVFSYWTTRNYREAYGLFAVNGIL 

161 FNHESPRRGETFVTRKITRAVARIRAGVQS 

211 EVYMGNLDAIRDWGYAPEYVEGMWRMLQAP 

241 EPDDYVLATGRGYTVREFAQAAFDHVGLDW 

271 QKRVKFDDRYLRPTEVDSLVGDADKAAQSL 

301 GWKASVHTGELARIMVDADIAALECDGTPW 

331 IDTPMLPGWGRVS 



Seq. ID No. 11 

1 gtgaagogag cgcttataac agggatcacg gggcaggatg gttcctacct cgccgagcta 
61 ctactgagca agggatacga ggttcacggg ctcgttcgtc gagcttcgac gtttaacacg 
121 tcgcggatcg atcacctcta cgttgaccca caocaaccgg gcgcgcgctt gttcttgcac 
181 tatgcagacc tcactgacgg cacccggttg gtgaccctgc tcagcagtat cgacccggat 
241 gaggtctaca acctcgcagc gcagtcccat gtgcgogtca gctttgacga gccagtgcat 
301 accggagaca ccaccggcat gggatogacc cgacttctgg aagcagtccg cctttctcgg 
361 gtggactgcc ggttctatca ggcttcctcg tcggagatgt tcggcgcatc tccgccaccg 
421 cagaacgaat cgacgccgtt ctatccccgt tcgccatacg gcgcggccaa ggtcttctcg 
481 tactggacga ctcgcaacta tcgagaggcg tacggattat tcgcagtgaa tggcatcttg 
541 ttcaaccatg agtccccccg gcgcggcgag actttcgtga cccgaaagat cacgcgtgcc 
601 gtggcgcgca tccgagctgg cgtccaatcg gaggtctata tgggcaacct cgatgcgatc 
661 cgcgactggg gctacgcgcc cgaatatgtc gaggggatgt ggaggatgtt gcaagcgcct 
721 gaacctgatg actacgtcct ggcgacaggg cgtggttaca ccgtacgtga gttcgctcaa 
781 gctgcttttg accacgtcgg gctcgactgg caaaagcacg tcaagtttga cgaccgctat 
841 ttgcgcccca ccgaggtcga ttcgctagta ggagatgccg acagggcggc ccagtcactc 
901 ggctggaaag cttcggttca tactggtgaa ctcgcgcgca tcatggtgga cgcggacatc 
961 gccgcgtcgg agtgcgatgg cacaccatgg atcgacacgc cgatgttgcc tggttggggc 
1021 ggagtaagtt ga 



Seq. ID No. 12 

1 VKRALITGITGQD 

31 LVRRASTFNTSRI 

61 YADLTDGTRLVTL 

91 VRVSFDEPVHTGD 

121 VDCRFYQASSSEM 

151 SPYGAAKVFSYWT 

181 FNHESPRRGETFV 

211 EVYMGNLDAIRDW 

241 EPDDYVLATGRGY 

271 QKHVKFDDRYLRP 

301 GWKASVHTGELAR 

331 IDTPMLPGWGGVS 



GSYLAELLLSKGYEVHG 
DHLYVDPHQPGARLFLH 
LSSIDPDEVYNLAAQSH 
TTGMGSIRLLEAVRLSR 
FGASPPPQNESTPFYPR 
TRNYREAYGLFAVNGIL 
TRKITRAVARIRAGVQS 
GYAPEYVEGMWRMLQAP 
TVREFAQAAFDHVGLDW 
TEVDSLVGDADRAAQSL 
IMVDADIAASECDGTPW 
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Seq. ID No. 13 

1 gtgcgatggc acaccatgga tcgacacgcc gatgttgcct ggttggggca gagtaagttg 
61 acgactacac ctgggcctct ggaccgcgca acgcccgtgt atatcgccgg tcatcggggg 
121 ctggtcggct cagcgctcgt acgtagattt gaggocgagg ggttcaccaa tctcattgtg 
5 181 cgatcacgcg atgagattga tctgacggac cgagccgcaa cgtttgattt tgtgtctgag 

241 acaagaccac aggtgatcat cgatgcggcc gcacgggtcg gcggcatcat ggcgaataac 
301 acctatcccg cggacttctt gtccgaaaac ctccgaatcc agaccaattt gctcgacgca 
361 gctgtcgccg tgcgtgtgcc gcggctcctt ttcctcggtt cgtcatgcat ctacccgaag 
421 tacgctccgc aacctatcca cgagagtgct tcattgactg gccctttgga gcccaccaac 

10 481 gacgcgtatg cgatcgccaa gatcgccggt atoctgcaag ttcaggcggt taggcgccaa 

541 tatgggctgg cgtggatctc tgcgatgccg actaacctct acggacccgg cgacaacttc 
601 tccccgtccg ggtcgcatct cttgccggcg ctcatccgtc gatatgagga agccaaagct 
661 ggtggtgcag aagaggtgac gaattggggg accggtactc cgcggcgcga acttctgcat 
721 gtcgacgatc tggcgagcgc atgcctgttc cttttggaac atttcgatgg tccgaaooac 

15 781 gtcaacgtgg gcaccggcgt cgatcacagc attagcgaga tcgcagacat ggtcgctaca 

841 gcggtgggct acatcggcga aacacgttgg gatccaacta aacccgacgg aaccccgcgc 
901 aaactattgg acgtctccgc gctacgcgag ttgggttggc gcccgcgaat cgcactgaaa 
961 gacggcatcg atgcaacggt gtcgtggtac cgcacaaatg ccgatgccgt gaggaggtaa 



Seq. ID No. 14 
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DVAWLGQSKLTTTPGPLDRA 
LVGSALVRRFEAEGFTNLIV 
RAATFDFVSETRPQVIIDAA 
TYPADFLSENLRIQTNLLDA 
FLGSSCIYPKYAPQPIHESA 
DAYAIAKIAGILQVQAVRRQ 
TNLYGPGDNFSPSGSHLLPA 
GGAEEVTNWGTGTPRRELLH 
LLEHFDGPNHVNVGTGVDHS 
AVGYIGETRWDPTKPDGTPR 
LGWRPRIALKDGIDATVSWY 
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Seq. ID No. 15 

1 gtgcgatggc acaccatgga tcgacacgcc gatgttgcct ggttggggcg gagtaagttg 
61 acgactacac ctgggcotct ggaccgogca acgcccgtgt atatcgccgg tcatcggggg 
121 ctggtcggct cagcgctcgt acgtagattt gaggccgagg ggttcaccaa tctcattgtg 
5 181 cgatcacgcg atgagattga tctgacggac cgagccgcaa cgtttgattt tgtgtctgag 

241 acaagaccac aggtgatcat cgatgcggcc gcacgggtcg gcggcatcat ggcgaataac 
301 acctatcccg cggacttctt gtccgaaaac ctccgaatcc agaccaattt gctcgacgca 
361 gctgtcgccg tgcgtgtgcc gcggctcctt ttcctcggtt cgtcatgcat ctacccgaag 
421 tacgctccgc aacctatcca cgagagtgct ttattgactg gccctttgga gcccaccaac 

10 481 gacgcgtatg cgatcgccaa gatcgccggt atcctgoaag ttcaggoggt taggcgccaa 

541 tatgggctgg cgtggatctc tgcgatgccg actaacctct acggacccgg cgacaacttc 
601 tccccgtccg ggtcgcatct cttgccggcg ctcacccgtc gatatgagga agccaaagct 
661 ggtggtgcag aagaggtgac gaattggggg accggtactc cgcggcgcga acttctgcat 
721 gtcgacgatc tggcgagcgc atgcctgttc cttttggaac atttcgatgg tccgaaccac 

15 781 gtcaacgtgg gcaccggcgt cgatcacagc attagcgaga tcgcagacat ggtcgctacg 

841 gcggtgggct acatcggcga aacacgttgg gatccaacta aacccgatgg aaccccgcgc 
901 aaactattgg acgtctccgc gctacgcgag ttgggttggc gcccgcgaat cgcactgaaa 
961 gacggcatcg atgcaaoggt gtcgtggtac cgcacaaatg ccgatgccgt gaggaggtaa 
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Seq. ID No. 17 

1 atggattttt tgcgcaacgc cggcttgatg gctcgtaacg ttagtaccga gatgctgcgc 
35 61 cacttcgaac gaaagcgcct attagtaaac caattcaaag catacggagt caacgttgtt 

121 attgatgtcg gtgctaactc cggccagtcc ggtagcgctt tgcgtcgtgc aggattcaag 
181 agccgtatcg tttcctttga acctctttcg gggccatttg cgcaactaac gcgcaagtcg 
241 gcatcggatc cactatggga gtgtcaccag tatgccctag gcgacgccga tgagacgatt 
301 accatcaatg tggcaggcaa tgcgggggca agtagttccg tgctgccgat gcttaaaagt 
40 361 catcaagatg actttcctcc cgcgaattat atcggcaccg aagacgttgc aatacaoogc 

421 cttgattcgg ttgcatcaga atttctgaac cctaccgatg ttactttcct gaagatcgac 
481 gtacagggtt tcgagaagca ggttatcacg ggcagtaagt caacgcttaa cgaaagctgc 
541 gtcggcatgc aactcgaact ttcttttatt ccgttgtacg aaggtgacat gctgattcat 
601 gaagcgcttg aacttgtcta ttccccaggt ttcagactga cgggtttgtt gcccggcttt 
45 661 acggatccgc gcaatggtcg aatgcttcaa gctgacggca ttttottccg tggggacgat 

721 tga 
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Seq. ID No. 18 
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Seq. ID No. 19 

1 atggattttt tgcgcaacgc cggcttgatg gctcgtaacg ttagcaccga gatgctgcgc 
61 cacttcgaac gaaagcgcct attagcaaac caattcaaag catacggagt caacgttgtt 
121 attgatgtcg gtgctaactc cggccagttc ggtagcgctt tgcgtcgtgc aggattcaag 
181 agccgtatcg tttcctttga acctctttcg gggccatttg cgcaactaac gcgcgagtcg 
241 gcatcggatc cactatggga gtgtcaccag tatgccctag gcgacgccga tgagacgatt 
3 01 accatcaatg tggcaggcaa tgcgggggca agtagttccg tgctgccgat gcttaaaagt 
361 catcaagatg cctttcctcc cgcgaattat attggcaccg aagacgttgc aatacaccgc 
421 cttgattcgg ttgcatcaga atttctgaac cctaocgatg ttactttcct gaagatcgac 
481 gtacagggtt tcgagaagoa ggttatcgcg ggcagtaagt caacgcttaa cgaaagctgc 
541 gtcggcatgc aactcgaact ttcttttatt ccgttgcacg aaggtgacat gctgattcat 
601 gaagcgcttg aacttgtcta ttccctaggt ttcagactga cgggttcgtt gcccggattt 
661 acggatccgc gcaatggtcg aatgcttcaa gctgacggca ttttcttccg tggggaogat 
721 tga 



Seq. ID No. 20 

1 MDFLRNAGLMARN 

31 QFKAYGVNVVIDV 

SI SRIVSFEPIiSGPF 

91 YAIiGDADETITXN 

121 HQDAFPPANYIGT 

151 PTDVTFLKIDVQG 

181 VGMQLELSFIPLY 

211 FRLTGLLPGFTDP 



VSTEMLRHFERKRLLVN 
GANSGQFGSALRRAGFK 
AQLTRESASDPLWECHQ 
VAGNAGASS SVLPMLKS 
EDVAIHRLDSVASEFLN 
FEKQVIAGSKSTLNESC 
EGDMLIHEALELVYSLG 
RNGRMLQADGIFFRGDD 



wo 97/23624 



PCT/GB96/03221 



- 50- 

Seq. ID No. 21 

1 atgactgcgc cagtgttctc gataattatc cctaccttca atgcagcggt gacgctgcaa 
61 gcctgcctcg gaagcatcgt cgggcagacc taccgggaag tggaagtggt ccttgtcgac 
121 ggcggttcga ccgatcggac cctcgacacc gcgaacagtt tccgcccgga actcggctcg 
5 181 cgactggtcg ttcacagcgg gcccgatgat ggcccctacg acgccatgaa ccgcggcgtc 

241 ggcgtggcca caggcgaatg ggtacttttt ttaggcgccg acgacaccct ctacgaacca 
301 accacgttgg cccaggtagc cgcttttctc ggcgaccatg cggoaagcca tcttgtctat 
361 ggcgatgttg tgatgcgttc gacgaaaagc cggcatgccg gacotttcga cctcgaccgc 
421 ctcctatttg agacgaattt gtgocaccaa tcgatctttt accgccgtga gcttttcgac 
10 481 ggcatcggcc cttacaacct gcgctaccga gtctgggcgg actgggactt caatattcgc 

541 tgcttctcca acccggcgct gattacccgc tacatggacg tcgtgatttc cgaatacaac 
601 gacatgaacg gcttcagcat gaggcagggg actgataaag agttcagaaa acggctgcca 
661 atgtacttct gggttgcagg gtgggagact tgcaggcgca tgctggcgtc tttgaaagac 
721 aaggagaatc gccgtctggc cttgcgtacg cggttgataa gggttaaggc cgtctccaaa 
15 781 gaacgaagcg cagaaccgta g 



Seq. ID Ho. 22 

1 MTAPVFSIIIPTFNAAVTLQACLGSIVGQT 

31 YREVEVVLVDGGSTDRTLDIANSFRPELGS 

61 RLVVHSGPDDGPYDAMNRGVGVATGEWVLF 

20 91 LGADDTLYEPTTLAQVAAFLGDHAASHLVY 

121 GDVVMRSTKSRHAGPFDLDRLLFETNLCHQ 

151 SIFYRRELFDGIGPYNLRYRVWADWDFNIR 

181 CFSNPALITRYMDVVISEYNDMTGFSMRQG 

211 TDKEFRKRLPHYFWVAGWETCRRMLAFLKD 
25 241 KENRRLALRTRLIRVKAVSKERSAEP 



Seq. ID No. 23 

1 atgactgcgc cagtgttctc gataattatc cctaccttca atgcagcggt gacgctgcaa 
61 gcctgcctcg gaagcatcgt cgggcagacc taccgggaag tggaagtggt ccttgtcgac 
121 ggcggttcga ccgatcggac cctcgacatc gcgaacagtt tccgcccgga actcggctcg 

30 181 cgactggtcg ttcacagcgg gcccgatgat ggcccctacg acgccatgaa ccgcggcgtc 

241 ggcgtagcca caggcgaatg ggtacttttt ttaggcgccg acgacaccct ctacgaacca 
301 accacgttgg cccaggtagc cgcttttctc ggcgaccatg cggcaagcca tcttgtctat 
361 ggcgatgttg tgatgcgttc gacgaaaagc cggcatgccg gacctttcga cctcgaccgc 
421 ctcctatttg agacgaattt gtgccaccaa tcgatctttt accgccgtga gcttttcgac 

35 481 ggcatcggcc cttacaacct gcgctaccga gtctgggcgg actgggactt caatattcgc 

541 tgcttctcca acccggcgct gattacccgc tacatggacg tcgtgatttc cgaatacaac 
601 gacatgaccg gcttcagcat gaggcagggg actgataaag agttcagaaa acggctgcca 
661 atgtacttct gggttgcagg gtgggagact tgcaggcgca tgctggcgtt tttgaaagac 
721 aaggagaatc gccgtctggc cttgcgtacg cggttgataa gggttaaggc cgtctccaaa 

40 781 gaacgaagcg cagaaccgta g 
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Seq. ID No. 24 

1 MTAPVFSIIIPTFNAAVTLQACLGSIVGQT 
31 YREVEVVLVDGGSTDRTLDIANSFRPELGS 
SI RLVVHSGPDDGPYDAMNRGVGVATGEWVLF 
91 LGADDTLYEPTTLAQVAAFLGDHAASHLVY 
121 GDVVMRSTKSRHAGPFDLDRLLFETNLCHQ 
151 SIFYRRELFDGIGPYNLRYRVWADWDFNIR 
181 CFSNPALITRYMDVVISEYNDMTGFSMRQG 
211 TDKEFRKRLPMYFWVAGWETCRRMLAFLKD 
241 KENRRLALRTRLIRVKAVSKERSAEP 



Seq. ID No. 25 

1 gtggccagca gaagtcccca ctccgctgcg ggtggttggc taattcttgg cggctccctt 
61 cttgtggtcg gcgtggcgca tccggtagga ctcgccggag gtgacgacga tgctggcgtg 
121 gtgcagcagc cgatcgagga tgctggcggc ggtggtgtgc tcgggcagga atcgccccca 
181 ttgttcgaag ggccaatgcg aggcgatggc cagggagcgg cgctcgtagc cggcagccac 
241 gagccggaac aacagttgag tcccggtgtc gtcgagcggg gcgaagccga tctcgtccaa 
301 gatgaccaga tccgcgcgga gcagggtgtc gatgatcttg ccgacggtgt tgtoggccag 
361 gccgcggtag aggacctcga tcaggtcggc ggcggtgaag tagcggactt tgaatccggc 
421 gtggacggca gcgtgcccgc agccgatgag caggtgactt ttgcccgtac caggtgggcc 
481 aatgaccgcc aggttctgtt gtgcccgaat ccattccagg ctcgacaggt agtcgaacgt 
541 ggctgcggtg atcgacgatc cggtgacgtc gaaccogtcg agggtottgg tgaccgggaa 
601 ggctgcggcc ttgagacggt tggcggtgtt ggaggcatcg cgggcagcga tctcggoctc 
661 aaccaacgtc cgcaggatct cctccggtgt ccagcgttgc gtcttggcga cttgcaacac 
721 ctcggcggcg ttgcggcgca ccgtggccag cttcaaccgc cgcagogccg cgtcaaggtc 
781 agoagccagc ggtgccgccg aggacggtgc caccggcttg gcagcggtgg tcatgaggcc 
841 gtcccgtcgg tggtgttgac cttgtag 



Seq. ID No. 26 

1 VASRSPHSAAGGW 

31 LAGGDDDAGVVQQ 

61 LFEGPMRGDGQGA 

91 VERGEADLVQDDQ 

121 AAVEDLDQVGGGE 

151 QVTFARTRWANDR 

181 GCGDRRSGDVEPV 

211 GGIAGSDLGLNQR 

241 LGGVAAHRGQLQP 

271 HRLGSGGHEAVPS 



LILGGSLLVVGVAHPVG 
PIEDAGGGGVLGQESPP 
ALVAGSHEPEQQLSPGV 
IRAEQGVDDLADGVVGQ 
VADFESGVDGSVPAADE 
QVLLCPNPFQARQVVER 
EGLGDREGCGLETVGGV 
PQDLLRCPALRLGDLQH 
PQRRVKVSSQRCRRGRC 
V V L I L 
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52- 



Seq. ID No. 27 





1 


atgggctgcc 


tcaaaggtgg 




ex 


ogattcgcgt 


cccactatgg 




121 


aagggcatcg 


tggagaacct 


5 


181 


accgaagctg 


cgttagccgg 




241 


ctatggtgcg 


ccgaggtcaa 




301 


cgcttggttg 


acgagcgcac 




361 


tcggggtcgg 


tgcgccgtaa 




421 


tactcggtgc 


cCcagcggct 


10 


481 


ctgatcctgt 


tggaacctgc 




541 


ggtgaggtgt 


ccatcctcga 




SOI 


cctcgcccga 


aaacccaagc 




eei 


ttoctcgtcg 


gtgctgctgc 




721 


ctcggccttg 


gcgccgccca 


15 


781 


gcgtttcgcc 


ggttccgcgc 




841 


ccacaacccc 


gccccgccgg 




901 


tcgttggagg 


cctacaagat 




961 


agccggtggc 


accgtcctcg 




1021 


ggttgaagct 


ggccacggtg 


20 


1081 


aaogctggac 


aocggaggag 




1141 


atgcctccaa 


caccgccaac 




1201 


acgggttcga 


cgtcaccgga 




1261 


tggaatggat 


tcgggcacaa 




1321 


gtcacctgct 


catcggctgc 


25 


1381 


tcaccgccgc 


cgacctgatc 




1441 


agatcatcga 


caccccgctc 




1501 


cgctcgacga 


caccgggact 




1561 


gctccctggc 


catcgcctcg 




1621 


acaccaccgc 


cgccagcatc 


30 


1681 


ccggcgagtc 


ctaccggatg 



tgtcgtcgcc aatgttgttg ttccaacacc ggattatgtg 
cttcgttccg gacttctgcc acggtgcgga tcogcaatcg 
ctgtggctao gctcaggacg accttgcggt gccgctgctg 
tgagcaggtc gacctacgtg ccctcaacgc ccaggcgcaa 
tgccacggtc cactcggaga tctgcgccgt gcccaacgat 
cgtcttgagg gagctgccct cgctgcggoc gacgatcggc 
ggtcgacggc ctctcgtgca tccgttaogg ctcagctcgt 
cgtcggtgcc accgtggcgg tggtggtcga tcatggcgcc 
gaccggtgtg atcgtggccg agcacgagot cgtcagccca 
tgaacactac gacggaccca gacccgcacc ctcgcgtggc 
agagaaacga ttctgcgcat tgggaaccga agcgcagcag 
gatcggcaac acccgactga aatocgaact cgacattctg 
cggcgaacag gctttgattg acgcgctgcg ccgggcggtt 
tgccgacgtg cgctcgatcc tggccgccgg cgccggcacc 
cgacgcactc gtgctcgatc tgcccaccgt cgagacccgc 
caacaccacc gacgggacgg ccccatgacc accgccgcca 
gcggcaccgc tggctgctga ccttgacgcg gcgctgcggc 
cgccgcaacg cogccgaggt gttgcaagtc gccaagacgc 
atcctgcgga cgttggttga ggccgagatc gctgcccgcg 
cgtctcaagg ccgcagcctt cccggtcacc aagaccctcg 
tcgtcgatca ccgcagccac gttcgactac ctgtcgagcc 
cagaacctgg cggtcattgg cccacctggt acgggcaaaa 
gggcacgctg ccgtccacgc cggattcaaa gtccgctact 
gaggtcctct accgcggcct ggccgacaac accgtcggca 
cgcgcggatc tggtcatctt ggacgagatc ggcttcgccc 
caactgttgt tccggctcgt ggctgccggc tacgagcgcc 
cattggccct tcgaacaatg ggggcgattc ctgoccgagc 
ctcgatcggc tgotgcacca cgccagcatc gtcguoacct 
cgccacgccg accacaagaa gggagccgcc aagaattag 



Seq. ID No. 28 



35 



40 



1 


M 


G 


C 


L 


K 


G 


G 


V 


V 


A 


N 


V 


V 


V 


P 


T 


P 


D 


Y 


V 


R 


F 


A 


S 


H 


Y 


G 


F 


V 


P 


31 


D 


F 


C 


H 


G 


A 


D 


P 


Q 


S 


K 


G 


I 


V 


E 


N 


L 


C 


G 


Y 


A 


Q 


D 


D 


L 


A 


V 


P 


L 


L 


61 


T 


E 


A 


A 


L 


A 


G 


E 


Q 


V 


D 


L 


R 


A 


L 


N 


A 


Q 


A 


Q 


L 


W 


C 


A 


E 


V 


N 


A 


T 


V 


91 


H 


S 


E 


I 


C 


A 


V 


P 


N 


D 


R 


L 


V 


D 


E 


R 


T 


V 


L 


R 


E 


L 


P 


S 


L 


R 


P 


T 


1 


G 


121 


S 


G 


S 


V 


R 


R 


K 


V 


D 


G 


L 


S 


C 


I 


R 


Y 


G 


S 


A 


R 


y 


S 


V 


P 


Q 


R 


L 


V 


G 


A 


151 


T 


V 


A 


V 


V 


V 


D 


H 


G 


A 


L 


I 


L 


L 


E 


P 


A 


T 


G 


V 


I 


V 


A 


E 


H 


E 


L 


V 


S 


P 


181 


Q 


E 


V 


s 


I 


L 


D 


E 


H 


Y 


D 


G 


P 


R 


P 


A 


P 


S 


R 


G 


p 


R 


P 


K 


T 


Q 


A 


E 


K 


R 


211 


F 


C 


A 


L 


G 


T 


E 


A 


Q 


Q 


F 


h 


V 


G 


A 


A 


A 


I 


G 


N 


T 


R 


L 


K 


S 


E 


L 


D 


I 


L 


241 


L 


G 


L. 


G 


A 


A 


H 


G 


E 


Q 


A 


L 


I 


D 


A 


L 


R 


R 


A 


V 


A 


F 


R 


R 


F 


R 


A 


A 


D 


V 


271 


R 


S 


I 


L 


A 


A 


G 


A 


G 


T 


P 


Q 


P 


R 


P 


A 


G 


D 


A 


L 


V 


L 


D 


L 


P 


T 


V 


E 


T 


R 


301 


S 


L 


E 


A 


Y 


K 


I 


N 


T 


T 


D 


G 


T 


A 


S 
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Seq. ID No. 29 



1 


M 


T 


T 


A 


A 


K 


p 


V 


A 


P 


s 


S 


A 


A 


P 


h 


A 


A 


D 


L 


D 


A 


A 


L 


R 


R 


L 


K 


L 


A 


31 


T 


V 


R 


R 


N 


A 


A 


E 


V 


L 


Q 


V 


A 


K 


T 


Q 


R 


W 


T 


P 


E 


E 


I 


L 


R 


T 


L 


V 


B 


A 


61 


E 


I 


A 


A 


R 


D 


A 


S 


N 


T 


A 


N 


R 


L 


K 


A 


A 


A 


F 


P 


V 


T 


K 


T 


L 


D 


G 


F 


D 


V 


91 


T 


G 


S 


S 


I 


T 


A 


A 


T 


F 


D 


Y 


L 


S 


S 


L 


E 


W 


I 


R 


A 


Q 


Q 


N 


L 


A 


V 


I 


G 


P 


121 


P 


6 


T 


G 


K 


S 


H 


L 


L 


I 


G 


C 


G 


H 


A 


A 


V 


H 


A 


G 


F 


K 


V 


R 


Y 


F 


T 


A 


A 


D 


151 


L 


I 


E 


V 


L 


y 


R 


G 


L 


A 


D 


N 


T 


V 


G 


K 


I 


I 


D 


T 


L 


L 


R 


A 


D 


L 


V 


I 


L 


D 


181 


E 


I 


G 


F 


A 


p 


L 


D 


D 


T 


G 


T 


Q 


L 


L 


F 


R 


L 


V 


A 


A 


G 


y 


E 


R 


R 


S 


L 


A 


I 


211 


A 


S 


H 


H 


P 


F 


E 


Q 


W 


G 


R 


F 


L 


P 


E 


H 


T 


T 


A 


A 


S 


I 


L 


D 


R 


L 


L 


H 


H 


A 


241 


S 


I 


V 


V 


T 


s 


6 


E 


S 


y 


R 


M 


R 


H 


A 


D 


H 


K 


K 


G 


A 


A 


K 


N 















Seq. ID Ho. 30 

1 gtgacgtctg ctccgacogt 
61 cagcgcacgg tgaaaagtgt 
121 accgacggtg gcagcggcga 

15 181 gcgtattggc agtccgagcc 

241 cacgcatcgg gtgatctgtt 
301 gCggtagccc aggccgtgga 
361 Ctcgggatgg atcgtctcgt 
421 cgcaaattcc tggccggcaa 

20 481 ctggtggcca agatcggtgg 

541 atattgcggg ccgcgctggt 
601 gacaccacgg gcgtcggctc 
661 atgggogacc ttcatcgccg 
721 cgcggccggg agttctacgc 

25 781 tcgaaatag 



ctcggtgata acgatctcgt tcaacgaoct cgacgggttg 
gcgggcgcaa cgctaccggg gacgcatcga gcacatcgta 
cgacgtggtg gcatacctgt ccgggtgtga accaggcttc 
cgacggcggg cggtacgacg cgatgaacca gggcatcgcg 
gtggttcttg cactccgccg atcgtttttc ogggcccgac 
ggcgctatcc ggcaagggac cggtgtccga attgtggggc 
cgggctcgat cgggtgcgcg gcccgatacc tttcagcctg 
gcaggttgtt ccgcatcaag catcgtcctt cggatcatcg 
ctacgacctt gatttcggga tcgocgccga ccaggaattc 
atgcgagccg gtcacgattc ggtgtgtgct gtgcgagttc 
gcaccgggaa ccaagcgcgg tcttcggtga tctgcgccgc 
ctacccgttc gggggaaggc gaatatcaca tgcctaccta 
ctacaacagt ogattctggg aaaacgccct cacgcgaatg 



Seq. ID No. 31 



30 



1 


M 


T 


S 


A 


P 


T 


V 


S 


V 


I 


T 


I 


S 


31 


R 


Y 


R 


6 


R 


I 


E 


H 


I 


V 


I 


D 


G 


61 


A 


y 


W 


Q 


S 


E 


P 


D 


G 


G 


R 


Y 


D 


91 


H 


s 


A 


D 


R 


F 


S 


G 


P 


D 


V 


V 


A 


121 


F 


6 


M 


D 


R 


L 


V 


G 


L 


D 


R 


V 


R 


151 


P 


H 


Q 


A 


S 


F 


F 


G 


S 


S 


L 


V 


A 


181 


I 


L 


R 


A 


A 


L 


V 


C 


E 


P 


V 


T 


I 


211 


P 


S 


A 


V 


F 


G 


D 


L 


R 


R 


M 


G 


D 


241 


R 


G 


R 


E 


F 


Y 


A 


Y 


N 


S 


R 


F 


W 



FNDLDGLQRTVKSVRAQ 
GSGDDVVAYLSGCEPGF 
AMNQGIAHASGDLLWFL 
QAVEALSGKGPVSELWG 
GPIPFSLRKFLAGKQVV 
KIGGYDLDFGIAADQEF 
ECVLCEFDTTGVGSHRE 
LHRRYPFGGRRISHAYL 
ENVFTRMSK 
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Seq. ID No. 32 

1 gtgaagcgag cgctcatcac cggaatcacc ggccaggacg gctcgCatct cgccgaactg 
61 ctgctggcca aggggtatga ggttcacggg ctcatccggc gcgcttcgac gttcaacacc 
121 tcgcggatcg atcacctcta cgtcgacccg caccaaccgg gcgcgcggct gtttctgcac 
5 181 tatggtgacc tgatcgacgg aacccggttg gtgaccctgc tgagcaccat cgaacccgac 

241 gaggtgtaca acctggcggc gcagtcacac gtgcgggtga gcttcgacga acccgtgcac 
301 accggtgaca ccaccggcat gggatccatg cgactgctgg aagccgttcg gctctctcgg 
361 gtgcactgcc gcttctatca ggcgtcctcg tcggagatgt tcggcgcctc gccgccaccg 
421 cagaacgagc tgacgccgtt ctacccgcgg tcaccgtatg gcgccgccaa ggtctattcg 
10 481 tactgggcga cccgcaatta togcgaagcg tacggattgt tcgccgttaa cggcatcttg 

541 ttcaatcacg aatcaccgcg gcgcggtgag acgttcgtga cccgaaagat caccagggcc 
601 gtggcacgca tcaaggccgg tatccagtcc gaggtctata tgggcaatct ggatgcggtc 
661 cgcgactggg ggtacgcgcc cgaatacgtc gaaggcatgt ggcggacgct gcagaccgac 
721 gagcccgacg acttcgtttt ggcgaccggg cgcggtttca ccgtgcgtga gttcgcgcgg 
15 781 gccgcgttcg agcatgccgg tttggactgg cagcagtacg tgaaattcga ccaacgctat 

841 ctgcggccca ccgaggtgga ttcgctgatc ggcgacgcga ccaaggctgc cgaattgctg 
901 ggctggaggg cttcggtgca cactgacgag ttggctcgga tcatggtcga cgcggacatg 
961 gcggcgctgg agtgcgaagg caagccgtgg atcgacaagc cgatgatcgc cggccggaca 
1021 tga 



20 Seg. 



ID 


No. 


33 


1 


M 


K 


R 


31 


L 


I 


R 


61 


Y 


6 


D 


91 


V 


R 


V 


121 


V 


H 


C 


151 


S 


P 


Y 


181 


F 


M 


H 


211 


E 


V 


Y 


241 


E 


P 


D 


271 


Q 


Q 


Y 


301 


G 


W 


R 


331 


I 


D 


K 



ALITGITGQDGSYL 
EASTFNTSRIDHLY 
LIDGTRLVTLLSTI 
SFDEPVHTGDTTGM 
RFYQASSSEMFGAS 
GAAKVYSYWATRNY 
ESPRRGETFVTRKI 
MGNLDAVRDWGYAP 
DFVLATGRGFTVRE 
VKFDQRYLRPTEVD 
ASVHTDELARIMVD 
P M I A G R T 



AELLLAKGYEVHG 
VDPHQPGARLFLH 
EPDEVYHLAAQSH 
GSMRLLEAVRLSR 
PPPQNELTPFYPR 
REAYGLFAVHGIL 
TRAVARIKAGIQS 
EYVEGMWRMLQTD 
FARAAFEHAGLDW 
SLIGDATKAAELL 
ADMAALECEGKPW 



Seq. ID No. 34 





1 


atgaggctgg 


cccgtcgcgc 


35 


61 


tactttgccg 


aactggactg 




121 


agtgccgtgc 


togatgtcgg 




181 


ggcttcgcgg gccgcatcgt 




241 


cgcagcgcct 


ccacggaccc 




301 


ggaaccatct 


cgatcaacgt 


40 


361 


ttgaaacgac 


atcaggacgc 




421 


atacatcgac 


tcgattccgt 




481 


aagatcgacg 


tccaaggatt 




541 


gaccgatgcg 


tcggcatgca 




601 


ctcacocgcg 


aggcgctcga 


45 


661 


cccggtttca 


ccgacccccg 




721 


ggcagcgatt 


ga 



tcggaacatc ttgcgtcgca acggcatcga ggtgtcgcgo 
ggaacgcaat ttcttgogcc aactgcaatc gcatcgggtc 
ggccaattcg gggcagtacg ccaggggtct gcgcggcgcg 
ctcgttcgag ccgctgcccg ggccctttgc cgtcttgcag 
gttgtgggaa tgccggcgct gtgcgctggg cgatgtcgat 
cgccggcaao gagggcgcca gcagttccgt cttgccgatg 
ctttccaoca gccaactacg tgggcgccca acgggtgcog 
ggctgcagao gttctgcggc ccaacgatat tgcgttctcg 
cgagaagcag gtgatcgcgg gcggcgattc aacggtgcac 
gctcgagccg tccttccagc cgctgtacga gggtggcatg 
tctcgtggat tcgttgggct ttacgctctc gggattgcaa 
caacggtcga atgctgcagg ccgatggcat cttcttccgg 
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Seq. ID No. 35 



1 


M 


R 


L 


31 


F 


L 


R 


61 


G 


F 


A 


91 


C 


R 


R 


121 


L 


K 


R 


151 


V 


L 


R 


181 


D 


R 


C 


211 


S 


L 


G 


241 


G 


S 


D 



A R R A R N 

Q L Q S H R 

G R I V S F 

C A L G D V 

H Q D A F P 

P N D I A F 

V G M Q L E 

F T L S G L 



I L R R N G I 

V S A V L D V 

E P L P G P F 

D G T I S I N 

P A N Y V G A 

L K I D V Q G 

L S F Q P L Y 

0 P G F T D P 



E V S R Y F A 

G A N S G Q y 

A V L Q R S A 

V A G N E G A 

Q R V P I H R 

F E K Q V I A 

E G G M L I R 

R N G R M L Q 



£ 
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Seq. ID No. 36 

1 gtgaaatcgt tgaaactcgc tcgttccatc gcgcgtagcg cogocttcga ggtttogcgc 
61 cgctattctg agcgagacct gaagcaccag tttgtgaagc aactcaaatc gcgtcgggta 
121 gatgtcgttt tcgatgtcgg cgccaactca ggacaatacg ccgccggcct ccgccgagca 

15 181 gcatataagg gccgcattgt ctcgttcgaa ccgctatccg gaccgtttac gatcttggaa 

241 agcaaagcgt caaoggatcc actttgggat tgccggcagc atgcgtcggg cgattctgat 
301 ggaacggtta cgatcaatac ogcaggaaac gccggtcaga gcagttccgt cttgcccatg 
361 ctgaaaagtc atcagaacgc ttttcccccg gcaaactatg tcggtaccca agaggcgtcc 
421 atacatcgac ttgattccgt ggcgccagaa tttctaggca tgaacggtgt cgottttctc 

20 481 aaggtcgacg ttcaaggctt tgaaaagcag gtgctcgccg ggggcaaato aaccatagat 

541 gaccaccgcg tcggcatgca actcgaactg tccttcctgc cgttgtacga aggtggcatg 
601 ctcattcctg aagccctcga tctcgtgtat tccttgggct tcacgttgac gggattgctg 
661 ccttgtttca ttgatgcaaa taatggtcga atgttgcagg ccgacggcat ctttttccgc 
721 gaggacgatt ga 
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Sag. ID No. 38 

1 atggtgcaga cgaaacgata cgccggcttg accgcagcta acacaaagaa agtcgccatg 
61 gccgcaccaa tgttttcgat catcatcccc accttgaacg tggctgcggt attgcctgcc 
121 tgcctcgaca gcatcgcccg tcagacctgc ggtgacttcg agctggtact ggtcgacggc 
5 181 ggctcgacgg acgaaaccct cgacatogcc aacattttcg cccccaacct cggogagcgg 

241 ttgatcattc atogcgacac cgaccagggc gtctacgacg ocatgaaccg cggcgtggac 
301 ctggccaccg gaacgtggtt gctctttctg ggcgcggacg acagcctgta cgaggctgac 
361 accctggcgc gggtggccgc cttcattggc gaacacgagc ccagcgatct ggtatatggc 
421 gacgtgatca tgcgctcciac caatttccgc tggggtggcg ccttcgacct cgaccgtctg 
10 481 ttgttcaagc gcaacatctg ccatcaggcg atcttctacc gccgoggaot cttcggcacc 

541 atcggtccct acaacccccg ctaccgggtc ctggccgact gggacttcaa tattcgctgc 
501 ttttccaacc cagcgctcgt cacccgctac atgcacgtgg tcgttgcaag ctacaaogaa 
661 ttcggcgggc tcagcaatac gatcgtcgao aaggagtttt tgaagcggct gccgatgtcc 
721 acgagactcg gcataaggct ggtcatagtt ctggtgcgca ggtggccaaa ggtgatcagc 
15 781 agggccatgg taatgcgcac cgtcatttct tggcggcgcc gacgttag 
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Seq 40 : 

GATGCCXSTGAGGAGGTAAAGCTGC 
Seg 41: 



30 GATACG6CTCTTGAATCCTGCACG 
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CLAIMS 

1. A polypeptide in substantially isolated form v/hich 
comprises a sequence selected from the sequences of 
Seq.ID.No: 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 and 
29, or a polypeptide substantially homologous thereto. 

2. A polypeptide in substantially isolated form which 
comprises a sequence selected from the' sequences of 
Seq.ID.No: 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 and 
29 . 

3. A polypeptide which comprises a fragment of a 
polypeptide defined in claim 1 or 2, said fragment 
comprising at least 12 amino acids and an epitope. 

4. A polynucleotide in substantially isolated form which 
encodes a polypeptide according to any one of claims 1 to 
3 . 

5. A polynucleotide in substantially isolated form which 
is capable of selectively hybridizing to Seq.ID.No: 3 or 4 
or a fragment thereof. 

6. A polynucleotide fragment according to claim 5 which 
comprises a sequence selected from the sequences of 
Seq.ID.No: 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25 and 27, 
or a polynucleotide at least 90% homologous thereto. 

7. A polynucleotide in substantially isolated form 
comprising a sequence selected from the sequences of 
Seq.ID.No: 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25 and 27. 

8. A polynucleotide probe which comprises a fragment of 
at least 15 nucleotides of a polynucleotide as defined in 
any one of claims 4 to 7 , optionally carrying a revealing 
label . 
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9. A recombinant vector carrying a polynucleotide as 
defined in any one of claims 4 to 7 . 

10. An antibody capable of binding a polypeptide or 
fragment thereof as defined in any one of claims 1 to 3 . 

11. An antibody capable of binding a polypeptide or 
fragment thereof wherein the polypeptide is a polypeptide 
which comprises a sequence selected from the sequences of 
Seq.ID.No: 31, 33, 35, 37 and 39 or is a peptide 
substantially homogolous thereto. 

12. A test kit for detecting the presence or absence of a 
pathogenic mycobacterium in a sample which comprises a 
polynucleotide according to any one of claims 4 to 8 , a 
polypeptide according to any one of claims 1 to 3 , a 
polypeptide which comprises a sequence selected from the 
sequences of Seq.ID.No: 31, 33, 35, 37 and 39 or a 
polypeptide substantially homogolous thereto, or an 
antibody according to, any one of claims 10 or 11. 

13 . A method of detecting the presence or absence of 
antibodies in an animal or human, against a pathogenic 
mycobacteria in a sample which comprises: 

(a) providing a polypeptide according to any one of 
claims 1 to 3 or a polypeptide which comprises a 
sequence selected from the sequences of 
Seq.ID.No: 31, 33, 35, 37 and 39 or a polypeptide 
substantially homoqolous thereto, which 
comprises an epitope; 

(b) incubating a biological sample with said 
polypeptide under conditions which allow for the 
formation of an antibody-antigen complex; and 

(c) determining whether antibody-antigen complex 
comprising said polypeptide is formed. 

14. A method of detecting the presence or absence of a 
polypeptide according to any one of claims 1 to 3 or a 
polypeptide which comprises a sequence selected from the 
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sequences of Seq.ID.No: 31, 33 , 35, 37 and 39 or a 
polypeptide substantially hoinogolous thereto in a 
biological sample which method which comprises: 

(a) providing an antibody according to any one of 
claims 10 and 11; 

(b) incubating a biological sample with said antibody 
under conditions which allow for the formation of 
an antibody-antigen complex; and 

(c) determining whether antibody-antigen complex 
comprising said antibody is formed. 

15. A method of detecting the presence or .absence of cell 
mediated immune reactivity in an animal or human, to a 
polypeptide according to claims 1 to 3 or a polypeptide 
which comprises a sequence selected from the sequences of 
Seq.ID.No: 31, 33, 35, 37 and 39 or a polypeptide 
substantially homogolous thereto, which method comprises 

(a) providing a polypeptide according to any one of 
claims 1 to 3 or a polypeptide which comprises a 
sequence selected from the sequences of 
Seq.ID.No: 31, 33, 35, 37 and 39 or a polypeptide 
substantially homogolous thereto, which comprises 
an epitope; 

(b) incubating a cell sample with said polypeptide 
under conditions which allow for a cellular 
immune response such as release of cytokines or 
other mediator or reaction to occur; and 

(c) detecting the presence of said cytokine or 
mediator or cellular response in the incubate. 

16. A pharmaceutical composition comprising a polypeptide 
according to any one of claims 1 to 3 in a suitable carrier 
or diluent. 

17. A composition according to claim 16 or a composition 
comprising a polypeptide which comprises a sequence 
selected from the sequences of Seq.ID.No: 31, 33, 35, 37 
and 39 or a polypeptide substantially homogolous thereto, 
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for use in the treatment or prevention of diseases caused 
by mycobacteria. 

18. A method of treating or preventing mycobacterial 
disease in an animal or human caused by mycobacteria which 
express a polypeptide according to claims 1 to 3 or a 
polypeptide which comprises a sequence selected from the 
sequences of Seq.ID.No: 31, 33, 35, 37 and 39 or a 
polypeptide substantially homogolous thereto, which method 
comprises vaccinating or treating an animal or human with 
an effective amount of said polypeptide. 

19. A method of treating or preventing mycobacterial 
diseases in animals or humans caused by mycobacteria 
containing the polynucleotide of Seq.ID.No: 3 or 4 , which 
method comprises vaccinating or treating an animal or human 
with an effective amount of a polynucleotide according to 
claims 4 to 7, a vector according to claim 9 or a 
polynucleotide which encodes a polypeptide which comprises 
a sequence selected from the sequences of Seq.ID.No: 31, 
33, 35, 37 and 39 or a polypeptide substantially homogolous 
thereto . 

20. A method according to claims 18 or 19 for increasing 
the in vivo susceptibility of mycobacteria to antimicrobial 
drugs . 

21. A normally pathogenic mycobacteriura, whose 
pathogenicity is mediated in all or "in part by the presence 
or the expression of a polypeptide as defined in any one of 
claims l to 3 or a polypeptide which comprises a sequence 
selected from the sequences of Seq.ID.No: 31, 33, 35, 37 
and 39 or a polypeptide substantially homogolous thereto, 
which mycobacterium harbours an attenuating mutation in a 
gene encoding one of the said polypeptides. 

22. A vaccine comprising a mycobacterium as claimed in 
claim 21. 
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23. A vaccine according to claim 22 wherein the 
mycobacteria is selected from Mavs , Mptb and Mtb. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: St George's Hospital Medical School 

(B) STREET: Cranmer Terrace 

(C) CITY: London 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP): SW17 ORE 

(ii) TITLE OF INVENTION: NOVEL POLYNUCLEOTIDES AND POLYPEPTIDES IN 
PATHOGENIC MYCOBACTERIA AND THEIR USE AS DIAGNOSTICS, 
VACCINES AND TARGETS FOR CHEMOTHERAPY 

(iii) NUMBER OF SEQUENCES: 41 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0, Version #1.30 (EPO) 

(v) CURRENT APPLICATION DATA: 

APPLICATION NUMBER: WO PCT/GB96/03221 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 674 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



GATCCAACTA 


AACCCGATGG 


AACCCCGCGC 


AAACTATTGG 


ACGTCTCCGC 


GCTACGCAGT 


60 


TGGGTTG6CG 


CCCGCGAATC 


GCACTGAAAG 


AGGGCATCGA 


TGCAACGGTG 


TCGTGGTACC 


120 


GCACAAATGC 


CGATGCCGTG 


AGGAGGTAAA 


GCTGCGGGCC 


GGCCGATGTT 


ATCCCTCCGG 


180 


CCGGACGGGT 


AGGGCGACCT 


GCCATCGAGT 


GGTACG6CAG 


TCGCCTGGCC 


GGCGAGGCGC 


240 


ATGGCCTATG 


TGAGTATCCC 


ATAGCCTGGC 


TTGGCTCGCC 


CCTACGCATT 


ATCAGTTGAC 


300 



CGCTTTCGCG CCACGTCGCA GGCTTGCGGC AGCATCCCGT TCAGGTCTCC TCATGGTCCG 360 

GTGTGGCACG ACCACGCAAG CTCGAACCGA CTCGTTTCCC AATTTCGCAT GCTAATATCG 420 

CTCGATGGAT TTTTTGCGCA ACGCCGGCTT GATGGCTCGT AACGTTAGCA CCGAGATGCT 480 

GCGCCACTCC GAACGAAAGC GCCTATTAGT AAACCAAGTC GAAGCATAC6 GAGTCAACGT 540 

TGTTATTGAT GTCGGTGCTA ACTCCGGCCA GTTCGGTAGC GCTTTGCGTC GTGCAG6ATT 600 

CAAGAGCCGT ATCGTTTCCT TTGAACCTCT TTCGGGGCCA TTTGCGCAAC TAACGCGCAA 660 

GTCGGCATCG GATC 674 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 574 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GATCCGATGC CGACTTGCGC GTTAGTTGCG CAAATGGCCC CGAAAGAGGT TCAAAGGAAA 60 

CGATACGGCT CTTGAATCCT GCACGACGCA AAGCGCTACC GAACT6GCCG GAGTTAGCAC 120 

CGACATCAAT AACAACGTTG ACTCCGTATG CTTCGACTTG GTTTACTAAT AGGCGCTTTC 180 

GTTCGGAGTG GCGCAGCATC TCGGTGCTAA CGTTACGAGC CATCAAGCCG GCGTTGCGCA 240 

AAAAATCCAT CGAGCGATAT TAGCATGCGA AATTGGGAAA CGAGTCGGTT CGAGCTTGCG 300 

TGGTCGTGCC ACACCGGACC ATGAGGAGAC CTGAACGGGA TGCTGCCGCA AGCCT6CGAC 360 

GTGGCGCGAA AGCGGTCAAC TGATAATGCG TAGGGGCGAG CCAAGCCAGG CTATGGGATA 420 

CTCACATAGG CCATGCGCCT CGCCGGCCAG GCGACTGCCG TACCACTCGA TGGCAGGTCG 480 

CCCTACCCGT CCGGCCGGAG GGATAACATC GGCCGGCCCG CAGCTTTACC TCCTCACGGC 540 

ATCGGCATTT GTGCGGTACC ACGACACCGT TGCATCGATG CCCTCTTTCA GTGCGATTCG 600 

CGGGCGCCAA CCCAACTGCG TAGCGCGGAG ACGTCCAATA GTTTGCGCGG GGTTCCATCG 660 



GGTTTAGTTG GATC 674 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7995 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GAATTCTGGG TTGGAGACGA CGTCGAACTC CTGGTCGGTC TTGCTTCGAA TGATCGCTGT 60 

GATCTGGTCG GCGGTGCCGA CAGGAACCGT CGACTTGTCG ACGATCACCT TGTACCGGTC 120 

GATGTATGAC CCAATGTCGT CCGCAACCGA GAAGACGTAC GTCAGGTCCG CCGCCCCGCT 180 

TTCACCCATG GGCGTCGGGA CGGCGATGAA AATGACGTCC GCGTGCTCGA TTCCGCGTTG 240 

CCGGTCGGTG GTGAAGTCAA TCAGCCCGTT CTCACGGTTC CTCGCAATCA ACTCCCAACC 300 

CGGGCTCGAA AATC6GGACA CTGCCTGCGA GGAGCAAATC GATCTTGGCC TGATCGATAT 360 

CGACACAGAC GACATCGTTG CCGCTATCCG CGAGACAGGC GCCCGTGACG AGGCCTACAT 420 

AGCCTGATCC GACCACCGAA ATTTTCAAGA TGACCCCTTC AAGTCCCCGA TCGGTCGACG 480 

ACCATACTGC CGCAACTCTG TACCCTCC6T GGGTAATTCG CATGTCGCGT TCGTAAGGAG 540 

CAGCCAGCGA GTCGGGGACG TTCGGTGAGA GAGTCGCAGG ACTACGAGGT TGCCGGTGCG 600 

ATACATCACA GTGTTGCGTC TGTCGGCAAC GATGCAGCAA GAACCCACGG GGCAGCCCTG 660 

AACTGCGCGC ATGACCGGTC CTTGTCCTGG CACCTTTGAT CGGCCACCGC TTCCATGCGA 720 

ACATGACCGG AATCCATAGC GCGTGGTCAA GCAGCG6GGA GGTAGACGTC GGTGTCATCT 780 

GCTCCAACCG TGTCGGTGAT AACGATTTCG CTGAACGATC TCGAGGGATT GAAAAGCACC 840 

GTGGAGAGCG TTCGCGCGCA GCGCTATGGG GGGCGAATCG AGCACATCGT CATCGACGGT 900 

GGATCGGGCG ACGCCGTCGT GGAGTATCTG TCCGGCGATC CTGGCTTTGC ATATTGGCAA 960 

TCTCAGCCCG ACAACGGGA6 ATATGACGC6 ATGAATCAGG GCATTGCCCA TTCGTCGGGC 1020 



GACCTGTTGT GGTTTATGCA CTCCACGGAT CGTTTCTCCG ATCCAGATGC AGTCGCTTCC 1080 

GTGGTGGAGG CGCTCTCGGG GCATGGACCA GTACGTGATT TGTGGGGTTA CGGGAAAAAC 1140 

AACCTTGTCG GACTCGACGG CAAACCACTT TTCCCTCGGC CGTAC6GCTA TATGCCGTTT 1200 

AAGATGCGGA AATTTCTGCT CGGCGCGACG GTTGCGCATC AGGCGACATT CTTCGGCGCG 1260 

TCGCTGGTAG CCAAGTTGGG CGGTTACGAT CTTGATTTTG GACTCGAGGC GGACCAGCTG 1320 

TTCATCTACC GTGCCGCACT AATACG6CCT CCCGTCACGA TCGACCGCGT GGTTTGCGAC 1380 

TTCGATGTCA CGGGACCTGG TTCAACCCAG CCCATCCGTG AGCACTATCG GACCCTGCGG 1440 

CGGCTCTGGG ACCTGCAT6G CGACTACCCG CTGGGTGGGC GCAGAGTGTC GTGGGCTTAC 1500 

TT6CGT6TGA A6GAGTACTT GATTCGGGCC GACCTGGCCG CATTCAACGC GGTAAAGTTC 1560 

TTGCGAGCGA AGTTCGCCAG AGCTTCGCGG AAGCAAAATT CATAGAAACC AACTTCTACT 1620 

GCCTGACCTG AGCAGCGCCG AGGCGCGCAG CGCGATCAGT GCGACCTGAA CGGCCAGGTG 1680 

GAAAGCGCCA CCGATCCCGG CACCGAGTGC CTGACGCTTC GGATCCCTTG CACCACAACG 1740 

AGAGTGAGAG CGCCATGATG AGGAAATATC GGCTGGGCGG AGTCAACGCC GGAGT6ACAA 1800 

AAGTGAGAAC CCGGTGAAGC GAGCGCTTAT AACAGGGATC ACGGGGCAGG ATGGTTCCTA 1860 

CCTCGCCGAG CTACTACTGA GCAAGGGATA C6AGGTTCAC GGGCTCGTTC GTCGAGCTTC 1920 

GACGTTTAAC ACGTCGCGGA TCGATCACCT CTACGTTGAC CCACACCAAC CGGGCGCGCG 1980 

CTTGTTCTTG CACTATGCAG ACCTCACTGA CGGCACCCGG TTGGTGACCC TGCTCAGCAG 2040 

TATCGACCCG GATGAGGTCT ACAACCTCGC AGCGCAGTCC CATGTGCGCG TCAGCTTTGA 2100 

CGAGCCAGTG CATACCGGAG ACACCACCGG CATGGGATCG ATCCGACTTC TGGAAGCAGT 2160 

CCGCCTTTCT CGGGTGGACT GCCGGTTCTA TCAGGCTTCC TCGTCGGAGA TGTTCGGCGC 2220 

ATCTCCGCCA CCGCAGAACG AATCGACGCC GTTCTATCCC CGTTCGCCAT ACGGCGCGGC 2280 

CAAGGTCTTC TCGTACTGGA CGACTCGCAA CTATCGAGAG GCGTACGGAT TATTCGCAGT 2340 

GAATGGCATC TTGTTCAACC ATGAGTCCCC CCGGCGCGGC GAGACTTTCG TGACCCGAAA 2400 

GATCACGCGT GCCGTGGCGC GCATCCGAGC TGGCGTCCAA TCGGAGGTCT ATATGGGCAA 2460 



CCTCGATGCG ATCCGCGACT GGGGCTACGC GCCCGAATAT GTCGAGGGGA TGTGGAGGAT 2520 

GTTGCAAGCG CCTGAACCTG ATGACTACGT CCTGGCGACA GGGCGTGGTT ACACCGTACG 2580 

TGAGTTCGCT CAAGCTGCTT TTGACCATGT CGGGCTCGAC TGGCAAAAGC GCGTCAAGTT 2640 

TGACGACCGC TATTTGCGTC CCACCGAGGT CGATTCGCTA GTAGGAGATG CCGACAAGGC 2700 

GGCCCAGTCA CTCGGCTGGA AAGCTTCGGT TCATACTGGT GAACTCGCGC GCATCATGGT 2760 

GGACGCGGAC ATCGCCGCGT TGGAGT6CGA TGGCACACCA TGGATCGACA CGCCGATGTT 2820 

GCCTGGTTGG GGCAGAGTAA GTTGACGACT ACACCTGGGC CTCTGGACCG CGCAACGCCC 2880 

GTGTATATCG CCGGTCATCG GGGGCTGGTC GGCTCAGCGC TCGTACGTAG ATTTGAGGCC 2940 

GAGGGGTTCA CCAATCTCAT TGTGCGATCA CGCGATGAGA TTGATCTGAC GGACCGAGCC 3000 

GCAACGTTTG ATTTTGTGTC TGAGACAAGA CCACAGGTGA TCATCGATGC GGCCGCACGG 3060 

GTCGGCGGCA TCATGGCGAA TAACACCTAT CCCGCGGACT TCTTGTCCGA AAACCTCCGA 3120 

ATCCAGACCA ATTTGCTCGA CGCAGCTGTC GCCGTGCGTG TGCCGCGGCT CCTTTTCCTC 3180 

GGTTCGTCAT GCATCTACCC GAAGTACGCT CCGCAACCTA TCCACGA6AG TGCTTTATTG 3240 

ACTGGCCCTT TGGAGCCCAC CAACGACGCG TATGCGATCG CCAAGATCGC CGGTATCCTG 3300 

CAAGTTCAGG CGGTTAGGCG CCAATATGGG CTGGCGTGGA TCTCTGCGAT GCCGACTAAC 3360 

CTCTACGGAC CCGGCGACAA CTTCTCCCCG TCCGGGTCGC ATCTCTTGCC GGCGCTCATC 3420 

CGTCGATATG AGGAAGCCAA AGCTGGTGGT GCAGAAGAGG TGACGAATTG GGGGACCGGT 3480 

ACTCCGCGGC GCGAACTTCT GCATGTC6AC GATCTGGCGA GCGCATGCCT GTTCCTTTTG 3540 

GAACATTTCG ATGGTCCGAA CCACGTCAAC GTGGGCACCG GCGTCGATCA CAGCATTAGC 3600 

GAGATCGCAG ACATGGTCGC TACAGCGGTG GGCTACATCG GCGAAACACG TTGGGATCCA 3560 

ACTAAACCCG ATGGAACCCC GCGCAAACTA TTGGAC6TCT CCGCGCTACG CGAGTTGGGT 3720 

TGGCGCCCGC GAATCGCACT GAAAGACGGC ATCGATGCAA CGGTGTCGTG GTACCGCACA 3780 

AATGCCGATG CCGTGAGGAG GTAAAGCTGC GGGTCGGCC6 ATGTTATCCC TCCGGCCGGA 3840 

CGGGTGGGGC GACCTGCCGT CGAGTGGTAC GGCAGTCGCC TGGCCGGCGA GGCGCGTGGC 3900 



CTATGGGA6T ATCCAATAGC CT6GCTTGGC TCGCCCCTAC GCATTATCAG TTGACCGCTT 3960 

TCGCGCCAGC TCGCAGGCTT GCGGCAGCAT CCCGTTCAGG TCTCCTCATG GTCCGGTGTG 4020 

GCACGACCAC GCAAGCTCGA ACCGACTCGT TTCCCAATTT CGCATGCTAA TATCGCTCGA 4080 

TGGATTTTTT GCGCAAC6CC GGCTTGATGG CTCGTAACGT TAGTACCGAG ATGCTGCGCC 4140 

ACTTCGAACG AAAGCGCCTA TTAGTAAACC AATTCAAAGC ATACGGAGTC AACGTTGTTA 4200 

TTGATGTCGG TGCTAACTCC GGCCAGTTCG GTAGCGCTTT GCGTCGTGCA GGATTCAAGA 4260 

GCCGTATCGT TTCCTTTGAA CCTCTTTCGG GGCCATTTGC GCAACTAACG CGCAAGTCGG 4320 

CATCGGATCC ACTATGGGA6 TGTCACCA6T ATGCCCTAG6 CGACGCCGAT GAGACGATTA 4380 

CCATCAATGT GGCAGGCAAT GCGGGGGCAA GTAGTTCCGT GCTGCCGATG CTTAAAAGTC 4440 

ATCAAGATGC CTTTCCTCCC GCGAATTATA TTGGCACCGA AGACGTTGCA ATACACCGCC 4500 

TTGATTCGGT TGCATCAGAA TTTCTGAACC CTACCGATGT TACTTTCCTG AAGATCGACG 4560 

TACA6GGTTT C6AGAAGCAG GTTATCACGG GCAGTAAGTC AACGCTTAAC GAAAGCTGCG 4620 

TCGGCATGCA ACTCGAACTT TCTTTTATTC CGTTGTACGA AGGTGACATG CTGATTCATG 4680 

AAGCGCTTGA ACTTGTGTAT TCCCTAGGTT TCAGACTGAC GGGTTTGTT6 CCCGGCTTTA 4740 

CGGATCCGCG CAATGGTCGA ATGCTTCAAG CTGACGGCAT TTTCTTCCGT GGGGACGATT 4800 

6ACATAAAT6 CTCCGTCGGC ACCCTGCCGG TATCCAAACG GGCGATCTGG TGAGCCGGCC 4860 

TCCCGGGCAC CTAATCGACT ATCTAAATTG AGGCGGCCGC GACGTGCGGC ACGAACA6GT 4920 

6GCCGGCTGC TAGCGTTACA CACGTCATGA CTGCGCCAGT GTTCTCGATA ATTATCCCTA 4980 

CCTTCAATGC AGCGGTGACG CTGCAAGCCT GCCTCGGAAG CATCGTCGGG CAGACCTACC 5040 

GGGAAGTGGA AGTGGTCCTT GTCGACGGCG GTTCGACCGA TCGGACCCTC GACATCGCGA 5100 

ACAGTTTCCG CCCGGAACTC GGCTCGCGAC TGGTCGTTCA CAGCGGGCCC GATGATGGCC 5160 

CCTACGACGC CATGAACCGC GGCGTCGGCG TGGCCACAGG CGAATGGGTA CTTTTTTTAG 5220 

GCGCCGACGA CACCCTCTAC GAACCAACCA CGTT6GCCCA GGTAGCCGCT TTTCTCGGCG 5280 

ACCATGCGGC AAGCCATCTT GTCTATGGCG ATGTTGTGAT GCGTTCGACG AAAAGCCGGC 5340 



ATGCCGGACC TTTCGACCTC GACCGCCTCC TATTTGAGAC GAATTTGTGC CACCAATCGA 5400 

TCTTTTACCG CCGTGAGCTT TTCGACG6CA TCGGCCCTTA CAACCTGCGC TACCGAGTCT 5460 

GGGCGGACTG GGACTTCAAT ATTCGCTGCT TCTCCAACCC GGCGCTGATT ACCCGCTACA 5520 

TGGACGTCGT GATTTCCGAA TACAACGACA TGACCGGCTT CAGCATGAGG CAGGGGACTG 5580 

ATAAAGAGTT CAGAAAACGG CTGCCAATGT ACTTCT6GGT TGCAGGGTGG GAGACTTGCA 5640 

GGCGCATGCT GGCGTTTTTG AAAGACAAGG AGAATCGCCG TCTGGCCTTG CGTACGCGGT 5700 

TGATAAGGGT TAAGGCCGTC TCCAAAGAAC GAAGCGCAGA ACCGTAGTCG CGGATCCACA 5760 

TTGGACTTCT TTAACGCGTT TGCGTCCTGA TCCACCTTTC AAGCCCGTTC CGCGTAACGC 5820 

6GC6CGCAGA GAGTGGTCGC ATATCGCATC ACTGTTCTCG TGCCAGTGCT TGGAAAGCGT 5880 

CGAGCACTCT GGTTCGCGTT CTTGACGTTC GCGCCCGCTC CTAGAGGTAG CGTGTCACGT 5940 

GACTGAAGCC AAT6AGTGCA ACTCGGCGTC GCGAAAGGTT TCAGTCGCGG TTGAGCAAGA 6000 

CACCGCAAGA CTACTGGAGT GCGTGCACAA GCGCCTCCAG CTCGCGGCTG AAAGCGGATG 6060 

CAAAGGGATT CGAAGCTTGA GCAACATGCG AAGGGGAGAA CGGCCTATGA GGCTGGGACA 6120 

GGTTTTCGAT CCGCGCGCGA ATGCACTGTC AATGGCCAAG TAGAAGTCCC CGCTGGTGGC 6180 

CAGCAGAAGT CCCCACTCCG CTGCGGGT6G TTGGCTAATT CTTGGCGGCT CCCTTCTTGT 6240 

GGTCGGCGTG GCGCATCCGG TAGGACTCGC CGGAGGTGAC GACGATGCT6 GCGTGGTGCA 6300 

GCAGCCGATC GAGGATGCTG GCGGCGGTGG TGTGCTCGGG CAGGAATCGC CCCCATTGTT 6360 

CGAAGGGCCA ATGCGAGGCG ATGGCCAGGG AGCGGCGCTC GTAGCCGGCA GCCACGAGCC 6420 

GGAACAACAG TTGAGTCCCG GTGTCGTCGA GCGGGGCGAA GCCGATCTCG TCCAAGATGA 6480 

CCAGATCCGC GCGGAGCAGG GTGTCGATGA TCTTGCCGAC GGTGTTGTCG GCCAGGCCGC 6540 

GGTAGAGGAC CTCGATCAGG TCGGCGGCGG TGAAGTAGCG GACTTTGAAT CCGGCGTGGA 6600 

CGGCAGCGTG CCCGCAGCCG ATGAGCAGGT GACTTTTGCC CGTACCAGGT GGGCCAATGA 6660 

CCGCCAGGTT CTGTTGTGCC CGAATCCATT CCAG6CTCGA CAGGTAGTCG AACGTGGCTG 6720 

CGGTGATCGA CGATCCGGTG ACGTCGAACC CGTCGAG6GT CTTGGTGACC GGGAAGGCTG 6780 



CGGCCTTGAG ACGGTTGGCG GTGTTGGAGG CATCGCGGGC AGCGATCTCG GCCTCAACCA 6840 

ACGTCCGCAG GATCTCCTCC GGTGTCCAGC GTTGCGTCTT GGCGACTTGC AACACCTCGG 6900 

CGGCGTT6CG GCGCACCGTG GCCAGCTTCA ACCGCCGCAG CGCCGCGTCA AGGTCAGCAG 6950 

CCAGCGGTGC CGCCGAGGAC GGTGCCACCG GCTTGGCAGC GGTGGTCATG AGGCCGTCCC 7020 

GTCGGTGGTG TTGATCTTGT AGGCCTCCAA CGAGCGGGTC TC6ACGGTGG GCAGATCGAG 7080 

CACGAGTGCG TCGCCGGCGG GGCGGGGTTG TGGGGTGCCG GCGCCGGCGG CCAGGATCGA 7140 

GCGCACGTCG GCAGCGCGGA ACCGGCGAAA CGCAACCGCC CGGCGCAGCG CGTCAATCAA 7200 

AGCCTGTTCG CCGTGGGCGG CGCCAAGGCC GAGCAGAATG TCGAGTTCGG ATTTCAGTCG 7260 

G6TGTTGCCG ATCGCAGCAG CACCGACGAG GAACTGCTGC GCTTCGGTTC CCAATGCGCA 7320 

GAATC6TTTC TCTGCTTGGG TTTTCGGGCG AGGACCACGC GAGGGTGCGG GTCTGGGTCC 7380 

GTCGTAGTGT TCATCGAGGA TGGACACCTC ACCTGGGCTG ACGAGCTCGT GCTCGGCCAC 7440 

GATCACACCG GTCGCAGGTT CCAACAGGAT CAGGGCGCCA TGATCGACCA CCACCGCCAC 7500 

GGTGGCACCG ACGAGCCGCT GAGGCACCGA GTAACGAGCT GAGCCGTAAC GGATGCACGA 7560 

GAGGCCGTCG ACCTTACGGC GCACCGACCC CGAGCCGATC GTCGGCCGCA GCGAGGGCAG 7620 

CTCCCTCAAG ACGGTGCGCT CGTCAACCAA GCGATCGTTG GGCACGGCGC AGATCTCCGA 7680 

GTGGACCGTG GCATTGACCT CGGCGCACCA TAGTTGCGCC TGGGCGTTGA GGGCACGTAG 7740 

GTCGACCTGC TCACCGGCTA ACGCAGCTTC GGTCAGCAGC GGCACCGCAA GGTCGTCCTG 7800 

AGCGTAGCCA CAGAGGTTCT CCAC6ATGCC CTTCGATTGC GGATCCGCAC CGTGGCAGAA 7860 

GTCCGGAACG AAGCCATAGT GGGACGCGAA TCGCACATAA TCCGGT6TTG GAACAACAAC 7920 

ATTGGCGACG ACACCACCTT TGAGGCAGCC CATCCGGTCG GCCAGGATCT TGGCCGGAAC 7980 

CCCACCGATC GCCTC 7995 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4435 base pairs 

(B) TYPE: nucleic acid 



AACGCCCGTG TATATCGCCG GTCATCGGGG GCTGGTCGGC TCAGCGCTCG TACGTAGATT 1320 

T6AGGCCGAG GGGTTCACCA ATCTCATTGT GCGATCACGC GATGAGATTG ATCTGACGGA 1380 

CCGAGCCGCA ACGTTTGATT TTGTGTCTGA GACAAGACCA CAGGTGATCA TCGATGCGGC 1440 

C6CACGGGTC GGCGGCATCA TGGCGAATAA CACCTATCCC GCGGACTTCT TGTCCGAAAA 1500 

CCTCCGAATC CAGACCAATT TGCTCGACGC AGCTGTCGCC GTGCGTGTGC CGCGGCTCCT 1560 

TTTCCTCGGT TCGTCATGCA TCTACCCGAA GTACGCTCCG CAACCTATCC ACGAGAGTGC 1620 

TTTATT6ACT GGCCCTTTGG AGCCCACCAA CGACGCGTAT GCGATCGCCA AGATCGCCGG 1680 

TATCCT6CAA GTTCAGGCGG TTAGGCGCCA ATATGGGCTG GCGTGGATCT CTGCGATGCC 1740 

GACTAACCTC TACGGACCCG GCGACAACTT CTCCCCGTCC GGGTCGCATC TCTTGCCGGC 1800 

GCTCATCCGT CGATATGAGG AAGCCAAAGC TGGTGGTGCA GAAGAGGTGA CGAATTGGGG 1860 

GACCGGTACT CCGCGGCGCG AACTTCTGCA TGTCGACGAT CTGGCGAGCG CATGCCTGTT 1920 

CCTTTTGGAA CATTTCGATG GTCCGAACCA CGTCAACGTG GGCACCGGCG TCGATCACAG 1980 

CATTAGCGAG ATCGCAGACA TGGTCGCTAC GGCGGT6GGC TACATCGGCG AAACACGTTG 2040 

GGATCCAACT AAACCCGATG GAACCCCGCG CAAACTATT6 GACGTCTCCG CGCTACGCGA 2100 

GTTGGGTTGG CGCCCGCGAA TCGCACTGAA AGACGGCATC GATGCAACGG TGTCGTGGTA 2160 

CCGCACAAAT GCCGATGCCG TGAGGAGGTA AAGCTGCGGG CCGGCCGATG TTATCCCTCC 2220 

G6CCGGACGG GTAGGGCGAC CTGCCATCGA GTGGTACGGC AGTCGCCTGG CCGGCGAGGC 2280 

GCATGGCCTA TGGGAGTATC CCATAGCCTG GCTTGGCTCG CCCCTACGCA TTATCAGTTG 2340 

ACCGCTTTCG CGCCAGCTCG CAGGCTCGCG GCAGCATCCC GTTCAGGTCT CCTCATGGTC 2400 

CGGTGTGGCA CGACCACGCA AGCTCGAACC GACTCGTTTC CCAATTTC6C ATGCTAATAT 2460 

CGCTCGATGG ATTTTTT6CG CAAC6CCGGC TTGATGGCTC GTAACGTTAG CACCGAGATG 2520 

CTGCGCCACT TCGAACGAAA GCGCCTATTA GTAAACCAAT TCAAAGCATA C6GAGTCAAC 2580 

GTTGTTATTG ATGTCGGTGC TAACTCCGGC CAGTTCGGTA GCGCTTTGCG TCGTGCAGGA 2640 

TTCAAGAGCC GTATCGTTTC CTTTGAACCT CTTTCGGGGC CATTTGCGCA ACTAACGCGC 2700 



GAGTCGGCAT CGGATCCACT ATGGGAGTGT CACCAGTATG CCCTAGGCGA CGCCGATGAG 2760 

ACGATTACCA TCAATGT6GC AGGCAATGCG GGGGCAAGTA GTTCCGTGCT GCCGATGCTT 2820 

AAAAGTCATC AAGATGCCTT TCCTCCCGCG AATTATATTG GCACCGAA6A CGTTGCAATA 2880 

CACCGCCTTG ATTCGGTTGC ATCAGAATTT CTGAACCCTA CCGATGTTAC TTTCCTGAAG 2940 

ATCGACGTAC AGGGTTTCGA GAAGCAGGTT ATCGCGGGCA GTAAGTCAAC GCTTAACGAA 3000 

AGCTGCGTCG GCATGCAACT CGAACTTTCT TTTATTCCGT TGTACGAAG6 T6ACATGCTG 3060 

ATTCATGAAG CGCTTGAACT TGTCTATTCC CTAGGTTTCA GACTGACGGG TTTGTTGCCC 3120 

GGATTTACGG ATCCGCGCAA TGGTCGAATG CTTCAAGCTG ACGGCATTTT CTTCCGTGGG 3180 

GACGATTGAC ATAAATGCTT GCGTCGGCAC CCTGCCGGTA TCCAAACGGG CGATCTGGTG 3240 

AGCCGGCCTC CCGGGCACCT AATCGACTAT CTAAATTGAG GC6GCCGCGA CGTGCGGCAC 3300 

GAACAGGTGG CCGGCTGCTA GCGTTACACA CGTCATGACT GCGCCA6TGT TCTCGATAAT 3360 

TATCCCTACC TTCAATGCAG CGGTGACGCT GCAAGCCTGC CTCGGAAGCA TCGTCGGGCA 3420 

GACCTACCGG GAAGTGGAAG TGGTCCTTGT CGACGGCGGT TCGACCGATC GGACCCTCGA 3480 

CATCGC6AAC AGTTTCCGCC CGGAACTCGG CTCGCGACTG GTCGTTCACA GCGGGCCCGA 3540 

TGATGGCCCC TACGACGCCA TGAACCGCGG CGTCGGCGTA GCCACAGGCG AATGGGTACT 3600 

TTTTTTAGGC GCC6ACGACA CCCTCTACGA ACCAACCACG TTGGCCCAGG TAGCCGCTTT 3660 

TCTCGGCGAC CATGCGGCAA GCCATCTTGT CTATGGCGAT GTTGTGATGC GTTCGACGAA 3720 

AAGCCGGCAT GCCGGACCTT TCGACCTCGA CCGCCTCCTA TTTGAGACGA ATTTGTGCCA 3780 

CCAATCGATC TTTTACCGCC GTGAGCTTTT CGACGGCATC GGCCCTTACA ACCTGCGCTA 3840 

CCGAGTCTGG GCGGACTGGG ACTTCAATAT TCGCTGCTTC TCCAACCCGG CGCTGATTAC 3900 

CCGCTACATG GACGTCGTGA TTTCCGAATA CAACGACATG ACCGGCTTCA GCATGAGGCA 3960 

GGGGACTGAT AAAGAGTTCA GAAAACGGCT GCCAATGTAC TTCTGGGTTG CAGGGTGGGA 4020 

GACTTGCAGG CGCATGCTGG CGTTTTTGAA AGACAAGGAG AATCGCCGTC TGGCCTTGCG 4080 

TACGCGGTTG ATAAGGGTTA AGGCCGTCTC CAAAGAACGA AGCGCAGAAC CGTAGTCGCG 4140 



(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

TTCTACTGCC TGACCTGAGC AGCGCCGAGG CGCGCAGCGC GATCACTGCG ACCTGAATGG 60 

CCAGGTGGAA AGCGCCACCG ATCCCGGCAC CGAGTGCCTG ACGATTCGGA TCCCTTGCAC 120 

CACAACGAGA GTGAGACCGC CATGATGACG AAATATCGGC TGGGCGGAGT CAACGCCGGA 180 

GTGACAAAAG TGAGAACCCG GTGAAGCGAG CGCTTATAAC AGGGATCACG GGGCAGGATG 240 

GTTCCTACCT CGCCGAGCTA CTACTGAGCA AGGGATACGA GGTTCACGGG CTCGTTCGTC 300 

GAGCTTCGAC GTTTAACACG TCGCG6ATCG ATCACCTCTA CGTTGACCCA CACCAACCGG 360 

GCGCGCGCTT GTTCTTGCAC TAT6CAGACC TCACTGACGG CACCC6GTTG GTGACCCTGC 420 

TCAGCAGTAT CGACCCGGAT GAGGTCTACA ACCTCGCAGC GCAGTCCCAT GTGCGCGTCA 480 

GCTTTGACGA GCCAGTGCAT ACCGGAGACA CCACCGGCAT GGGATCGATC CGACTTCTGG 540 

AAGCAGTCCG CCTTTCTCGG GTGGACTGCC GGTTCTATCA GGCTTCCTCG TCGGAGATGT 600 

TCGGCGCATC TCCGCCACCG CAGAACGAAT CGACGCCGTT CTATCCCCGT TCGCCATACG 660 

GCGCGGCCAA GGTCTTCTCG TACTGGACGA CTCGCAACTA TCGAGAGGCG TACG6ATTAT 720 

TCGCAGTGAA TGGCATCTTG TTCAACCAT6 AGTCCCCCCG GCGCGGCGAG ACTTTCGTGA 780 

CCCGAAAGAT CACGCGTGCC GTGGCGC6CA TCCGAGCTGG CT6CCAATCG GAGGTCTATA 840 

TGGGCAACCT CGATGCGATC CGCGACTGGG GCTACGCGCC CGAATATGTC GAGGGGATGT 900 

GGAGGATGTT GCAAGCGCCT GAACCTGATG ACTACGTCCT GGCGACAGGG CGTGGTTACA 960 

CCGTACGTGA GTTCGCTCAA GCTGCTTTTG ACCACGTCGG GCTCGACTGG CAAAAGCACG 1020 

TCAAGTTTGA CGACCGCTAT TTGCGCCCCA CCGAGGTCGA TTCGCTAGTA GGAGATGCCG 1080 

ACAGGGCGGC CCA6TCACTC GGCTGGAAAG CTTCGGTTCA TACTGGTGAA CTCGCGCGCA 1140 

TCATGGTGGA CGCGGACATC GCCGCGTCGG AGTGCGATGG CACACCATGG ATCGACACGC 1200 

CGATGTTGCC TGGTTGGGGC GGAGTAAGTT GACGACTACA CCTGGGCCTC TGGACCGCGC 1260 



GATCCACATT GGACTTCTTT AACGCGTTTG CGTCCTGATC CACCTTTCAA CCCCGTTCCG 4200 



CGTGACGCGG CGCGCAGAGA GTGGTCGCAT ATCGCGTCAC TGTTCTCGTG CCAGTGCTTG 4260 

GAAAGCGTCG AGCACTCTGG TTCGCGTTCT TGACGTTCGC GCCCGCCCCT AGAGGTAGCG 4320 

TGTCACGTGA CTGAAGCCAA TGAGTGCAAC TCGGCGTCGC GAAAGGTTTC AGTCGCGGTT 4380 

GAGCAAGACA CCGCAAGACT ACTGGAGTGC GTGCACAAGC GCCTCCAGCT CACGG 4435 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 378 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .375 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATG ATC GCT GTG ATC TGG TCG GCG GTG CCG ACA GGA ACC GTC GAC TTG 
Met He Ala Val He Trp Ser Ala Val Pro Thr Gly Thr Val Asp Leu 
15 10 15 

TCG ACG ATC ACC TTG TAC CGG TCG ATG TAT GAC CCA ATG TCG TCC GCA 
Ser Thr He Thr Leu Tyr Arg Ser Met Tyr Asp Pro Met Ser Ser Ala 
20 25 30 



48 



96 



ACC GAG AAG ACG TAC GTC AGG TCC GCC GCC CCG CTT TCA CCC ATG GGC 144 
Thr Glu Lys Thr Tyr Val Arg Ser Ala Ala Pro Leu Ser Pro Met Gly 
35 40 45 

GTC GGG ACG GCG ATG AAA ATG ACG TCC GCG TGC TCG ATT CCG CGT TGC 192 
Val Gly Thr Ala Met Lys Met Thr Ser Ala Cys Ser He Pro Arg Cys 
50 55 60 

CGG TCG GTG GTG AAG TCA ATC AGC CCG TTC TCA CGG TTC CTC GCA ATC 240 
Arg Ser Val Val Lys Ser He Ser Pro Phe Ser Arg Phe Leu Ala He 
65 70 75 80 

AAC TCC CAA CCC GGG CTC GAA AAT CGG GAC ACT GCC TGC GAG GAG CAA 288 
Asn Ser Gin Pro Gly Leu Glu Asn Arg Asp Thr Ala Cys Glu Glu Gin 



85 



90 



95 



ATC GAT CTT GGC CTG ATC GAT ATC GAC ACA GAC GAC ATC GTT GCC GCT 336 

He Asp Leu Gly Leu He Asp He Asp Thr Asp Asp He Val Ala Ala 
100 105 110 

ATC CGC GAG ACA GGC GCC CGT GAC GAG GCC TAC ATA GCC TGA 378 

He Arg Glu Thr Gly Ala Arg Asp Glu Ala Tyr He Ala 

115 120 125 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 125 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met He Ala Val He Trp Ser Ala Val Pro Thr Gly Thr Val Asp Leu 
15 10 15 

Ser Thr He Thr Leu Tyr Arg Ser Met Tyr Asp Pro Met Ser Ser Ala 
20 25 30 

Thr Glu Lys Thr Tyr Val Arg Ser Ala Ala Pro Leu Ser Pro Met Gly 
35 40 45 

Val Gly Thr Ala Met Lys Met Thr Ser Ala Cys Ser He Pro Arg Cys 
50 55 60 

Arg Ser Val Val Lys Ser He Ser Pro Phe Ser Arg Phe Leu Ala He 
65 70 75 80 

Asn Ser Gin Pro Gly Leu Glu Asn Arg Asp Thr Ala Cys Glu Glu Gin 
85 90 95 

He Asp Leu Gly Leu He Asp He Asp Thr Asp Asp He Val Ala Ala 
100 105 110 

He Arg Glu Thr Gly Ala Arg Asp Glu Ala Tyr He Ala 
115 120 125 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 834 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .831 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GTG TCA TCT GCT CCA ACC GTG TCG 6TG ATA ACG ATT TC6 CTG AAC GAT 48 
Val Ser Ser Ala Pro Thr Val Ser Val He Thr He Ser Leu Asn Asp 
130 135 140 

CTC GAG GGA TTG AAA AGC ACC GTG GAG AGC GTT CGC GCG CAG CGC TAT 96 
Leu Glu Gly Leu Lys Ser Thr Val Glu Ser Val Arg Ala Gin Arg Tyr 
145 150 155 

GGG GGG CGA ATC GAG CAC ATC GTC ATC GAC 6GT GGA TCG GGC GAC GCC 144 
Gly Gly Arg He Glu His He Val He Asp Gly Gly Ser Gly Asp Ala 
160 165 170 

GTC GTG GAG TAT CTG TCC GGC GAT CCT GGC TTT 6CA TAT TG6 CAA TCT 192 
Val Val Glu Tyr Leu Ser Gly Asp Pro Gly Phe Ala Tyr Trp Gin Ser 
175 180 185 

CAG CCC GAC AAC GGG AGA TAT GAC GCG ATG AAT CAG GGC ATT GCC CAT 240 
Gin Pro Asp Asn Gly Arg Tyr Asp Ala Met Asn Gin Gly He Ala His 
190 195 200 205 

TCG TCG GGC GAC CTG TTG TGG TTT ATG CAC TCC ACG GAT CGT TTC TCC 288 
Ser Ser Gly Asp Leu Leu Trp Phe Met His Ser Thr Asp Arg Phe Ser 
210 215 220 

GAT CCA GAT GCA GTC GCT TCC GTG GTG GAG GCG CTC TCG GGG CAT GGA 336 
Asp Pro Asp Ala Val Ala Ser Val Val Glu Ala Leu Ser Gly His Gly 
225 230 235 

CCA GTA CGT GAT TTG TGG GGT TAC GGG AAA AAC AAC CTT GTC GGA CTC 384 
Pro Val Arg Asp Leu Trp Gly Tyr Gly Lys Asn Asn Leu Val Gly Leu 
240 245 250 

GAC GGC AAA CCA CTT TTC CCT CGG CCG TAC GGC TAT ATG CCG TTT AAG 432 
Asp Gly Lys Pro Leu Phe Pro Arg Pro Tyr Gly Tyr Met Pro Phe Lys 
255 260 265 



ATG CGG AAA TTT CTG CTC 6GC GCG ACG GTT GCG CAT GAG GCG ACA TTC 480 
Met Arg Lys Phe Leu Leu Gly Ala Thr Val Ala His Gin Ala Thr Phe 
270 275 280 285 

TTC GGC GCG TCG CTG GTA GCC AAG TTG GGC GGT TAG GAT CTT GAT TTT 528 
Phe Gly Ala Ser Leu Val Ala Lys Leu Gly Gly Tyr Asp Leu Asp Phe 
290 295 300 

GGA CTC GAG GCG GAC CAG CTG TTC ATC TAC CGT GCC GCA CTA ATA CGG 576 
Gly Leu Glu A1a Asp Gin Leu Phe lie Tyr Arg Ala Ala Leu He Arg 
305 310 315 

CCT CCC GTC ACG ATC GAC CGC GTG GTT TGC GAC TTC GAT GTC ACG GGA 624 
Pro Pro Val Thr lie Asp Arg Val Val Cys Asp Phe Asp Val Thr Gly 
320 325 330 

CCT GGT TCA ACC CAG CCC ATC CGT GAG CAC TAT CGG ACC CTG CGG CGG 672 
Pro Gly Ser Thr Gin Pro He Arg Glu His Tyr Arg Thr Leu Arg Arg 
335 340 345 

CTC TGG GAC CTG CAT GGC GAC TAC CCG CTG GGT GGG CGC AGA GTG TCG 720 
Leu Trp Asp Leu His Gly Asp Tyr Pro Leu Gly Gly Arg Arg Val Ser 
350 355 360 365 

TGG GCT TAC TTG CGT GTG AAG GAG TAC TTG ATT CGG GCC GAC CTG GCC 758 
Trp Ala Tyr Leu Arg Val Lys Glu Tyr Leu He Arg Ala Asp Leu Ala 
370 375 380 

GCA TTC AAC GCG GTA AAG TTC TTG CGA GCG AAG TTC GCC AGA GCT TCG 816 
Ala Phe Asn Ala Val Lys Phe Leu Arg Ala Lys Phe Ala Arg Ala Ser 
385 390 395 

CGG AAG CAA AAT TCA TAG 834 
Arg Lys Gin Asn Ser 
400 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



Val Ser Ser Ala Pro Thr Val Ser Val lie Thr lie Ser Leu Asn Asp 
15 10 15 



Leu Glu Gly Leu Lys Ser Thr Val Glu Ser Val Arg Ala Gin Arg Tyr 
20 25 30 

Gly Gly Arg He Glu His He Val He Asp Gly Gly Ser Gly Asp Ala 
35 40 45 

Val Val Glu Tyr Leu Ser Gly Asp Pro Gly Phe Ala Tyr Trp Gin Ser 
50 55 60 

Gin Pro Asp Asn Gly Arg Tyr Asp Ala Met Asn Gin Gly He Ala His 
65 70 75 80 

Ser Ser Gly Asp Leu Leu Trp Phe Met His Ser Thr Asp Arg Phe Ser 
85 90 95 

Asp Pro Asp Ala Val Ala Ser Val Val Glu Ala Leu Ser Gly His Gly 
100 105 110 

Pro Val Arg Asp Leu Trp Gly Tyr Gly Lys Asn Asn Leu Val Gly Leu 
115 120 125 

Asp Gly Lys Pro Leu Phe Pro Arg Pro Tyr Gly Tyr Met Pro Phe Lys 
130 135 140 

Met Arg Lys Phe Leu Leu Gly Ala Thr Val Ala His Gin Ala Thr Phe 
145 150 155 160 

Phe Gly Ala Ser Leu Val Ala Lys Leu Gly Gly Tyr Asp Leu Asp Phe 
165 170 175 

Gly Leu Glu Ala Asp Gin Leu Phe He Tyr Arg Ala Ala Leu He Arg 
180 185 190 

Pro Pro Val Thr He Asp Arg Val Val Cys Asp Phe Asp Val Thr Gly 
195 200 205 

Pro Gly Ser Thr Gin Pro He Arg Glu His Tyr Arg Thr Leu Arg Arg 
210 215 220 



Leu Trp Asp Leu His Gly Asp Tyr Pro Leu Gly Gly Arg Arg Val Ser 
225 230 235 240 



Trp Ala Tyr Leu Arg Val Lys Glu Tyr Leu He Arg Ala Asp Leu Ala 
245 250 255 



Ala Phe Asn Ala Val Lys Phe Leu Arg Ala Lys Phe Ala Arg Ala Ser 
260 265 270 



Arg Lys Gin Asn Ser 
275 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1032 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!.. 1029 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GTG AAG CGA GCG CTT ATA ACA GGG ATC ACG GGG CAG GAT GGT TCC TAC 48 

Val Lys Arg Ala Leu He Thr Gly He Thr Gly Gin Asp Gly Ser Tyr 

280 285 290 

CTC GCC GAG CTA CTA CTG AGC AAG GGA TAC GAG GTT CAC GGG CTC GTT 96 

Leu Ala Glu Leu Leu Leu Ser Lys Gly Tyr Glu Val His Gly Leu Val 

295 300 305 

CGT CGA GCT TCG ACG TTT AAC ACG TCG CGG ATC GAT CAC CTC TAC GTT 144 

Arg Arg Ala Ser Thr Phe Asn Thr Ser Arg He Asp His Leu Tyr Val 

310 315 320 325 

GAC CCA CAC CAA CCG GGC GCG CGC TTG TTC TTG CAC TAT GCA GAC CTC 192 

Asp Pro His Gin Pro Gly Ala Arg Leu Phe Leu His Tyr Ala Asp Leu 

330 ' 335 340 

ACT GAC GGC ACC CGG TTG GTG ACC CTG CTC AGC AGT ATC GAC CCG GAT 240 

Thr Asp Gly Thr Arg Leu Val Thr Leu Leu Ser Ser He Asp Pro Asp 

345 350 355 

GAG GTC TAC AAC CTC GCA GCG CAG TCC CAT GTG CGC GTC AGC TTT GAC 288 

Glu Val Tyr Asn Leu Ala Ala Gin Ser His Val Arg Val Ser Phe Asp 

360 365 370 



GAG CCA GTG CAT ACC GGA GAC ACC ACC GGC ATG GGA TC6 ATC CGA CTT 336 
Glu Pro Val His Thr Gly Asp Thr Thr Gly Met Gly Ser lie Arg Leu 
375 380 385 

CTG GAA GCA GTC CGC CTT TCT CGG GTG GAC TGC CGG TTC TAT CAG GCT 384 
Leu Glu Ala Val Arg Leu Ser Arg Val Asp Cys Arg Phe Tyr Gin Ala 
390 395 400 405 

TCC TCG TCG GAG ATG TTC GGC GCA TCT CCG CCA CCG CAG AAC GAA TCG 432 
Ser Ser Ser Glu Met Phe Gly Ala Ser Pro Pro Pro Gin Asn Glu Ser 
410 415 420 

ACG CCG TTC TAT CCC CGT TCG CCA TAC GGC GCG GCC AAG GTC TTC TCG 480 
Thr Pro Phe Tyr Pro Arg Ser Pro Tyr Gly Ala Ala Lys Val Phe Ser 
425 430 435 

TAC TGG ACG ACT CGC AAC TAT CGA GAG GCG TAC GGA TTA TTC GCA GTG 528 
Tyr Trp Thr Thr Arg Asn Tyr Arg Glu Ala Tyr Gly Leu Phe Ala Val 
440 445 450 

AAT GGC ATC TTG TTC AAC CAT GAG TCC CCC CGG CGC GGC GAG ACT TTC 576 
Asn Gly He Leu Phe Asn His Glu Ser Pro Arg Arg Gly Glu Thr Phe 
455 460 465 

GTG ACC CGA AAG ATC ACG CGT GCC GTG GCG CGC ATC CGA GCT GGC GTC 524 
Val Thr Arg Lys lie Thr Arg Ala Val Ala Arg He Arg Ala Gly Val 
470 475 480 485 

CAA TCG GAG GTC TAT ATG GGC AAC CTC GAT GCG ATC CGC GAC TGG GGC 672 
Gin Ser Glu Val Tyr Met Gly Asn Leu Asp Ala He Arg Asp Trp Gly 
490 495 500 

TAC GCG CCC GAA TAT GTC GAG GGG ATG TGG AGG ATG TTG CAA GCG CCT 720 
Tyr Ala Pro Glu Tyr Val Glu Gly Met Trp Arg Met Leu Gin Ala Pro 
505 510 515 

GAA CCT GAT GAC TAC GTC CTG GCG ACA GGG CGT GGT TAC ACC GTA CGT 768 
Glu Pro Asp Asp Tyr Val Leu Ala Thr Gly Arg Gly Tyr Thr Val Arg 
520 525 530 

GAG TTC GCT CAA GCT GCT TTT GAC CAT GTC GGG CTC GAC TGG CAA AAG 816 
Glu Phe Ala Gin Ala Ala Phe Asp His Val Gly Leu Asp Trp Gin Lys 
535 540 545 

CGC GTC AAG TTT GAC GAC CGC TAT TTG CGT CCC ACC GAG GTC GAT TCG 864 
Arg Val Lys Phe Asp Asp Arg Tyr Leu Arg Pro Thr Glu Val Asp Ser 
550 555 560 565 



CTA GTA GGA GAT GCC GAC AAG GCG GCC CAG TCA CTC GGC TGG AAA GCT 
Leu Va] Gly Asp Ala Asp Lys Ala Ala Gin Ser Leu Gly Trp Lys Ala 
570 575 580 



912 



TCG GTT CAT ACT GGT GAA CTC GCG CGC ATC ATG GTG GAC GCG GAC ATC 960 
Ser Val His Thr Gly Glu Leu Ala Arg He Met Val Asp Ala Asp He 
585 590 595 

GCC GCG TTG GAG TGC GAT GGC ACA CCA TGG ATC GAC ACG CCG ATG TTG 1008 
Ala Ala Leu Glu Cys Asp Gly Thr Pro Trp He Asp Thr Pro Met Leu 
600 505 610 

CCT GGT TGG GGC AGA GTA AGT TGA 1032 
Pro Gly Trp Gly Arg Val Ser 
615 620 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Val Lys Arg Ala Leu He Thr Gly He Thr Gly Gin Asp Gly Ser Tyr 
15 10 15 

Leu Ala Glu Leu Leu Leu Ser Lys Gly Tyr Glu Val His Gly Leu Val 
20 25 30 

Arg Arg Ala Ser Thr Phe Asn Thr Ser Arg He Asp His Leu Tyr Val 
35 40 45 

Asp Pro His Gin Pro Gly Ala Arg Leu Phe Leu His Tyr Ala Asp Leu 
50 55 60 

Thr Asp Gly Thr Arg Leu Val Thr Leu Leu Ser Ser He Asp Pro Asp 
65 70 75 80 

Glu Val Tyr Asn Leu Ala Ala Gin Ser His Val Arg Val Ser Phe Asp 
85 90 95 



Glu Pro Val His Thr Gly Asp Thr Thr Gly Met Gly Ser He Arg Leu 
100 105 110 



Leu Glu Ala Val Arg Leu Ser Arg Val Asp Cys Arg Phe Tyr Gin Ala 
115 120 125 

Ser Ser Ser Glu Met Phe Gly Ala Ser Pro Pro Pro Gin Asn Glu Ser 
130 135 140 

Thr Pro Phe Tyr Pro Arg Ser Pro Tyr Gly Ala Ala Lys Val Phe Ser 
145 150 155 160 

Tyr Trp Thr Thr Arg Asn Tyr Arg Glu Ala Tyr Gly Leu Phe Ala Val 
165 170 175 

Asn Gly He Leu Phe Asn His Glu Ser Pro Arg Arg Gly Glu Thr Phe 
180 185 190 

Val Thr Arg Lys He Thr Arg Ala Val Ala Arg He Arg Ala Gly Val 
195 200 205 

Gin Ser Glu Val Tyr Met Gly Asn Leu Asp Ala He Arg Asp Trp Gly 
210 215 220 

Tyr Ala Pro Glu Tyr Val Glu Gly Met Trp Arg Met Leu Gin Ala Pro 
225 230 235 240 

Glu Pro Asp Asp Tyr Val Leu Ala Thr Gly Arg Gly Tyr Thr Val Arg 
245 250 255 

Glu Phe Ala Gin Ala Ala Phe Asp His Val Gly Leu Asp Trp Gin Lys 
260 265 270 

Arg Val Lys Phe Asp Asp Arg Tyr Leu Arg Pro Thr Glu Val Asp Ser 
275 280 285 

Leu Val Gly Asp Ala Asp Lys Ala Ala Gin Ser Leu Gly Trp Lys Ala 
290 295 300 

Ser Val His Thr Gly Glu Leu Ala Arg He Met Val Asp Ala Asp He 
305 310 315 320 

Ala Ala Leu Glu Cys Asp Gly Thr Pro Trp He Asp Thr Pro Met Leu 
325 330 335 



Pro Gly Trp Gly Arg Val Ser 
340 



(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 1032 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION :1.. 1029 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GTG AAG CGA GCG CTT ATA ACA GGG ATC ACG GGG CAG GAT GGT TCC TAC 48 
Val Lys Arg Ala Leu He Thr Gly He Thr Gly Gin Asp Gly Ser Tyr 
345 350 355 



CTC GCC GAG CTA CTA CTG AGC AAG GGA TAC GAG GTT CAC GGG CTC GTT 
Leu Ala Glu Leu Leu Leu Ser Lys Gly Tyr Glu Val His Gly Leu Val 
360 365 370 375 



CTG GAA GCA GTC CGC CTT TCT COG GTG GAC TGC CGG TTC TAT CAG GCT 
Leu Glu Ala Val Arg Leu Ser Arg Val Asp Cys Arg Phe Tyr Gin Ala 
460 465 470 



96 



CGT CGA GCT TCG ACG TTT AAC ACG TCG CGG ATC GAT CAC CTC TAC GTT 144 
Arg Arg Ala Ser Thr Phe Asn Thr Ser Arg He Asp His Leu Tyr Val 
380 385 390 

GAC CCA CAC CAA CCG GGC GCG CGC TTG TTC TTG CAC TAT GCA GAC CTC 192 
Asp Pro His Gin Pro Gly Ala Arg Leu Phe Leu His Tyr Ala Asp Leu 
395 400 405 

ACT GAC GGC ACC CGG TTG GTG ACC CTG CTC AGC AGT ATC GAC CCG GAT 240 
Thr Asp Gly Thr Arg Leu Val Thr Leu Leu Ser Ser He Asp Pro Asp 
410 415 420 

GAG GTC TAC AAC CTC GCA GCG CAG TCC CAT GTG CGC GTC AGC TTT GAC 288 
Glu Val Tyr Asn Leu Ala Ala Gin Ser His Val Arg Val Ser Phe Asp 
425 430 435 

GAG CCA GTG CAT ACC GGA GAC ACC ACC GGC ATG GGA TCG ATC CGA CTT 336 
Glu Pro Val His Thr Gly Asp Thr Thr Gly Met Gly Ser He Arg Leu 
440 445 450 455 



384 



TCC TCG TCG GAG ATG TTC GGC GCA TCT CCG CCA CCG CAG AAC GAA TCG 432 



Ser Ser Ser Glu Met Phe Gly Ala Ser Pro Pro Pro Gin Asn Glu Ser 
475 480 485 



ACG CCG TTC TAT CCC CGT TCG CCA TAC GGC GCG GCC AAG GTC TTC TCG 480 
Thr Pro Phe Tyr Pro Arg Ser Pro Tyr Gly Ala Ala Lys Val Phe Ser 
490 495 500 

TAC TGG ACG ACT CGC AAC TAT CGA GAG GCG TAC GGA TTA TTC GCA GTG 528 
Tyr Trp Thr Thr Arg Asn Tyr Arg Glu Ala Tyr Gly Leu Phe Ala Val 
505 510 515 

AAT GGC ATC TTG TTC AAC CAT GAG TCC CCC CGG CGC GGC GAG ACT TTC 576 
Asn Gly He Leu Phe Asn His Glu Ser Pro Arg Arg Gly Glu Thr Phe 
520 525 530 535 

GTG ACC CGA AAG ATC ACG CGT GCC GTG GCG CGC ATC CGA GCT GGC GTC 624 
Val Thr Arg Lys He Thr Arg Ala Val Ala Arg He Arg Ala Gly Val 
540 545 550 

CAA TCG GAG GTC TAT ATG GGC AAC CTC GAT GCG ATC CGC GAC TGG GGC 672 
Gin Ser Glu Val Tyr Met Gly Asn Leu Asp Ala He Arg Asp Trp Gly 
555 560 565 

TAC GCG CCC GAA TAT GTC GAG GGG ATG TGG AGG ATG TTG CAA GCG CCT 720 
Tyr Ala Pro Glu Tyr Val Glu Gly Met Trp Arg Met Leu Gin Ala Pro 
570 575 580 

GAA CCT GAT GAC TAC GTC CTG GCG ACA GGG CGT GGT TAC ACC GTA CGT 768 
Glu Pro Asp Asp Tyr Val Leu Ala Thr Gly Arg Gly Tyr Thr Val Arg 
585 590 595 

GAG TTC GCT CAA GCT GCT TTT GAC CAC GTC GGG CTC GAC TGG CAA AAG 816 
Glu Phe Ala Gin Ala Ala Phe Asp His Val Gly Leu Asp Trp Gin Lys 
600 605 610 615 

CAC GTC AAG TTT GAC GAC CGC TAT TTG CGC CCC ACC GAG GTC GAT TCG 864 
His Val Lys Phe Asp Asp Arg Tyr Leu Arg Pro Thr Glu Val Asp Ser 
620 625 630 

CTA GTA GGA GAT GCC GAC AGG GCG GCC CAG TCA CTC GGC TGG AAA GCT 912 
Leu Val Gly Asp Ala Asp Arg Ala Ala Gin Ser Leu Gly Trp Lys Ala 
635 640 645 

TCG GTT CAT ACT GGT GAA CTC GCG CGC ATC ATG GTG GAC GCG GAC ATC 960 
Ser Val His Thr Gly Glu Leu Ala Arg He Met Val Asp Ala Asp He 
650 655 660 

GCC GCG TCG GAG TGC GAT GGC ACA CCA TGG ATC GAC ACG CCG ATG TTG 1008 



Ala Ala Ser Glu Cys Asp Gly Thr Pro Trp He Asp Thr Pro Met Leu 
665 670 675 

CCT GGT TGG GGC GGA GTA AGT TGA 
Pro Gly Trp Gly Gly Val Ser 
680 685 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Val Lys Arg Ala Leu He Thr Gly He Thr Gly Gin Asp Gly Ser Tyr 
15 10 15 

Leu Ala Glu Leu Leu Leu Ser Lys Gly Tyr Glu Val His Gly Leu Val 
20 25 30 

Arg Arg Ala Ser Thr Phe Asn Thr Ser Arg He Asp His Leu Tyr Val 
35 40 45 

Asp Pro His Gin Pro Gly Ala Arg Leu Phe Leu His Tyr Ala Asp Leu 
50 55 60 

Thr Asp Gly Thr Arg Leu Val Thr Leu Leu Ser Ser He Asp Pro Asp 
65 70 75 80 

Glu Val Tyr Asn Leu Ala Ala Gin Ser His Val Arg Val Ser Phe Asp 
85 90 95 

Glu Pro Val His Thr Gly Asp Thr Thr Gly Met Gly Ser He Arg Leu 
100 105 110 

Leu Glu Ala Val Arg Leu Ser Arg Val Asp Cys Arg Phe Tyr Gin Ala 
115 120 125 

Ser Ser Ser Glu Met Phe Gly Ala Ser Pro Pro Pro Gin Asn Glu Ser 
130 135 140 



Thr Pro Phe Tyr Pro Arg Ser Pro Tyr Gly Ala Ala Lys Val Phe Ser 
145 150 155 160 



Tyr Trp Thr Thr Arg Asn Tyr Arg Glu Ala Tyr Gly Leu Phe Ala Val 
165 170 175 



Asn Gly He Leu Phe Asn His Glu Ser Pro Arg Arg Gly Glu Thr Phe 
180 185 190 

Val Thr Arg Lys He Thr Arg Ala Val Ala Arg He Arg Ala Gly Val 
195 200 205 

Gin Ser Glu Val Tyr Met Gly Asn Leu Asp Ala He Arg Asp Trp Gly 
210 215 220 

Tyr Ala Pro Glu Tyr Val Glu Gly Met Trp Arg Met Leu Gin Ala Pro 
225 230 235 240 

Glu Pro Asp Asp Tyr Val Leu Ala Thr Gly Arg Gly Tyr Thr Val Arg 
245 250 255 

Glu Phe Ala Gin Ala Ala Phe Asp His Val Gly Leu Asp Trp Gin Lys 
260 265 270 

His Val Lys Phe Asp Asp Arg Tyr Leu Arg Pro Thr Glu Val Asp Ser 
275 280 285 

Leu Val Gly Asp Ala Asp Arg Ala Ala Gin Ser Leu Gly Trp Lys Ala 
290 295 300 

Ser Val His Thr Gly Glu Leu Ala Arg He Met Val Asp Ala Asp He 
305 310 315 320 



Ala Ala Ser Glu Cys Asp Gly Thr Pro Trp He Asp Thr Pro Met Leu 
325 330 335 



Pro Gly Trp Gly Gly Val Ser 
340 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1020 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



GAC GCG TAT GCG ATC GCC AAG ATC GCC GGT ATC CTG CAA GTT CAG GCG 528 
Asp Ala Tyr Ala lie Ala Lys He Ala Gly He Leu Gin Val Gin Ala 
505 510 515 

GTT AGG CGC CAA TAT GGG CTG GCG TGG ATC TCT GCG ATG CCG ACT AAC 576 
Val Arg Arg Gin Tyr Gly Leu Ala Trp He Ser Ala Met Pro Thr Asn 
520 525 530 535 

CTC TAC GGA CCC GGC GAC AAC TTC TCC CCG TCC GGG TCG CAT CTC TTG 624 
Leu Tyr Gly Pro Gly Asp Asn Phe Ser Pro Ser Gly Ser His Leu Leu 
540 545 550 

CCG GCG CTC ATC CGT CGA TAT GAG GAA GCC AAA GCT GGT GGT GCA GAA 572 
Pro Ala Leu He Arg Arg Tyr Glu Glu Ala Lys Ala Gly Gly Ala Glu 
555 560 565 

GAG GTG ACG AAT TGG GGG ACC GGT ACT CCG CGG CGC GAA CTT CTG CAT 720 
Glu Val Thr Asn Trp Gly Thr Gly Thr Pro Arg Arg Glu Leu Leu His 
570 575 580 

GTC GAC GAT CTG GCG AGC GCA TGC CTG TTC CTT TTG GAA CAT TTC GAT 768 
Val Asp Asp Leu Ala Ser Ala Cys Leu Phe Leu Leu Glu His Phe Asp 
585 590 595 

GGT CCG AAC CAC GTC AAC GTG GGC ACC GGC GTC GAT CAC AGC ATT AGC 816 
Gly Pro Asn His Val Asn Val Gly Thr Gly Val Asp His Ser He Ser 
600 605 610 615 

GAG ATC GCA GAC ATG GTC GCT ACA GCG GTG GGC TAC ATC GGC GAA ACA 864 
Glu He Ala Asp Met Val Ala Thr Ala Val Gly Tyr He Gly Glu Thr 
620 625 630 

CGT TGG GAT CCA ACT AAA CCC GAT GGA ACC CCG CGC AAA CTA TTG GAC 912 
Arg Trp Asp Pro Thr Lys Pro Asp Gly Thr Pro Arg Lys Leu Leu Asp 
635 640 645 

GTC TCC GCG CTA CGC GAG TTG GGT TGG CGC CCG CGA ATC GCA CTG AAA 960 
Val Ser Ala Leu Arg Glu Leu Gly Trp Arg Pro Arg He Ala Leu Lys 
650 655 660 

GAC GGC ATC GAT GCA ACG GTG TCG TGG TAC CGC ACA AAT GCC GAT GCC 1008 
Asp Gly He Asp Ala Thr Val Ser Trp Tyr Arg Thr Asn Ala Asp Ala 
665 670 675 

GTG AGG AGG TAA 1020 

Val Arg Arg * 

680 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .1020 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GTG CGA TGG CAC ACC ATG GAT CGA CAC GCC GAT GTT GCC TGG TTG GGG 48 
Val Arg Trp His Thr Met Asp Arg His Ala Asp Val Ala Trp Leu Gly 
345 350 355 

CAG AGT AAG TTG ACG ACT ACA CCT GGG CCT CTG GAC CGC GCA ACG CCC 96 
Gin Ser Lys Leu Thr Thr Thr Pro Gly Pro Leu Asp Arg Ala Thr Pro 
360 365 370 375 

GTG TAT ATC GCC GGT CAT CGG GGG CTG GTC GGC TCA 6CG CTC GTA CGT 144 
Val Tyr He Ala Gly His Arg Gly Leu Val Gly Ser Ala Leu Val Arg 
380 385 390 

AGA TTT GAG GCC GAG GGG TTC ACC AAT CTC ATT GTG CGA TCA CGC GAT 192 
Arg Phe Glu Ala Glu Gly Phe Thr Asn Leu He Val Arg Ser Arg Asp 
395 400 405 

GAG ATT GAT CTG ACG GAC CGA GCC GCA ACG TTT GAT TTT GTG TCT GAG 240 
Glu lie Asp Leu Thr Asp Arg Ala Ala Thr Phe Asp Phe Val Ser Glu 
410 415 420 

ACA AGA CCA CAG GTG ATC ATC GAT GCG GCC GCA CGG GTC GGC GGC ATC 288 
Thr Arg Pro Gin Val He He Asp Ala Ala Ala Arg Val Gly Gly He 
425 430 435 

ATG GCG AAT AAC ACC TAT CCC GCG GAC TTC TTG TCC GAA AAC CTC CGA 336 
Met Ala Asn Asn Thr Tyr Pro Ala Asp Phe Leu Ser Glu Asn Leu Arg 
440 445 450 455 

ATC CAG ACC AAT TTG CTC GAC GCA GCT GTC GCC GTG CGT GTG CCG CGG 384 
He Gin Thr Asn Leu Leu Asp Ala Ala Val Ala Val Arg Val Pro Arg 
460 465 470 

CTC CTT TTC CTC GGT TCG TCA TGC ATC TAC CCG AAG TAC GCT CCG CAA 432 
Leu Leu Phe Leu Gly Ser Ser Cys He Tyr Pro Lys Tyr Ala Pro Gin 
475 480 485 

CCT ATC CAC GAG AGT GCT TTA TTG ACT GGC CCT TTG GAG CCC ACC AAC 480 
Pro He His Glu Ser Ala Leu Leu Thr Gly Pro Leu Glu Pro Thr Asn 
490 495 500 



(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 340 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Val Arg Trp His Thr Met Asp Arg His Ala Asp Val Ala Trp Leu Gly 
15 10 15 

Gin Ser Lys Leu Thr Thr Thr Pro Gly Pro Leu Asp Arg Ala Thr Pro 
20 25 30 

Val Tyr He Ala Gly His Arg Gly Leu Val Gly Ser Ala Leu Val Arg 
35 40 45 

Arg Phe Glu Ala Glu Gly Phe Thr Asn Leu He Val Arg Ser Arg Asp 
50 55 60 

Glu He Asp Leu Thr Asp Arg Ala Ala Thr Phe Asp Phe Val Ser Glu 
65 70 75 80 

Thr Arg Pro Gin Val He He Asp Ala Ala Ala Arg Val Gly Gly He 
85 90 95 

Met Ala Asn Asn Thr Tyr Pro Ala Asp Phe Leu Ser Glu Asn Leu Arg 
100 105 110 

He Gin Thr Asn Leu Leu Asp Ala Ala Val Ala Val Arg Val Pro Arg 
115 120 125 

Leu Leu Phe Leu Gly Ser Ser Cys He Tyr Pro Lys Tyr Ala Pro Gin 
130 135 140 

Pro He His Glu Ser Ala Leu Leu Thr Gly Pro Leu Glu Pro Thr Asn 
145 150 155 160 

Asp Ala Tyr Ala He Ala Lys He Ala Gly He Leu Gin Val Gin Ala 
165 170 175 

Val Arg Arg Gin Tyr Gly Leu Ala Trp He Ser Ala Met Pro Thr Asn 
180 185 190 



Leu Tyr Gly Pro Gly Asp Asn Phe Ser Pro Ser Gly Ser His Leu Leu 
195 200 205 



Pro Ala Leu He Arg Arg Tyr Glu Glu Ala lys Ala Gly Gly Ala Glu 
210 215 220 



Glu Val Thr Asn Trp Gly Thr Gly Thr Pro Arg Arg Glu Leu Leu His 
225 230 235 240 

Val Asp Asp Leu Ala Ser Ala Cys Leu Phe Leu Leu Glu His Phe Asp 
245 250 255 

Gly Pro Asn His Val Asn Val Gly Thr Gly Val Asp His Ser He Ser 
260 255 270 

Glu He Ala Asp Met Val Ala Thr Ala Val Gly Tyr He Gly Glu Thr 
275 280 285 

Arg Trp Asp Pro Thr Lys Pro Asp Gly Thr Pro Arg Lys Leu Leu Asp 
290 295 300 

Val Ser Ala Leu Arg Glu Leu Gly Trp Arg Pro Arg He Ala Leu Lys 
305 310 315 320 

Asp Gly He Asp Ala Thr Val Ser Trp Tyr Arg Thr Asn Ala Asp Ala 
325 330 335 

Val Arg Arg * 
340 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1020 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!.. 1020 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GTG CGA TGG CAC ACC ATG GAT CGA CAC GCC GAT GTT GCC TGG TTG GGG 
Val Arg Trp His Thr Met Asp Arg His Ala Asp Val Ala Trp Leu Gly 
345 350 355 



C6G AGT AAG TTG ACG ACT ACA CCT GGG CCT CTG GAC CGC GCA ACG CCC 96 
Arg Ser Lys Leu Thr Thr Thr Pro Gly Pro Leu Asp Arg Ala Thr Pro 
360 365 370 

GTG TAT ATC GCC GGT CAT CGG GGG CTG GTC GGC TCA GCG CTC GTA CGT 144 
Val Tyr He Ala Gly His Arg Gly Leu Val Gly Ser Ala Leu Val Arg 
375 380 385 

AGA TTT GAG GCC GAG GGG TTC ACC AAT CTC ATT GTG CGA TCA CGC GAT 192 
Arg Phe Glu Ala Glu Gly Phe Thr Asn Leu He Val Arg Ser Arg Asp 
390 395 400 

GAG ATT GAT CTG ACG GAC CGA GCC GCA ACG TTT GAT TTT GTG TCT GAG 240 
Glu He Asp Leu Thr Asp Arg Ala Ala Thr Phe Asp Phe Val Ser Glu 
405 410 415 420 

ACA AGA CCA CAG GTG ATC ATC GAT GCG GCC GCA CGG GTC GGC GGC ATC 288 
Thr Arg Pro Gin Val He He Asp Ala Ala Ala Arg Val Gly Gly He 
425 430 435 

ATG GCG AAT AAC ACC TAT CCC GCG GAC TTC TTG TCC GAA AAC CTC CGA 336 
Met Ala Asn Asn Thr Tyr Pro Ala Asp Phe Leu Ser Glu Asn Leu Arg 
440 445 450 

ATC CAG ACC AAT TTG CTC GAC GCA GCT GTC GCC GTG CGT GTG CCG CGG 384 
He Gin Thr Asn Leu Leu Asp Ala Ala Val Ala Val Arg Val Pro Arg 
455 460 465 

CTC CTT TTC CTC GGT TCG TCA TGC ATC TAC CCG AAG TAC GCT CCG CAA 432 
Leu Leu Phe Leu Gly Ser Ser Cys He Tyr Pro Lys Tyr Ala Pro Gin 
470 475 480 

CCT ATC CAC GAG AGT GCT TTA TTG ACT GGC CCT TTG GAG CCC ACC AAC 480 
Pro He His Glu Ser Ala Leu Leu Thr Gly Pro Leu Glu Pro Thr Asn 
485 490 495 500 

GAC GCG TAT GCG ATC GCC AAG ATC GCC GGT ATC CTG CAA GTT CAG GCG 528 
Asp Ala Tyr Ala He Ala Lys He Ala Gly He Leu Gin Val Gin Ala 
505 510 515 

GTT AGG CGC CAA TAT GGG CTG GCG TGG ATC TCT GCG ATG CCG ACT AAC 576 
Val Arg Arg Gin Tyr Gly Leu Ala Trp He Ser Ala Met Pro Thr Asn 
520 525 530 

CTC TAC GGA CCC GGC GAC AAC TTC TCC CCG TCC GGG TCG CAT CTC TTG 624 
Leu Tyr Gly Pro Gly Asp Asn Phe Ser Pro Ser Gly Ser His Leu Leu 
535 540 545 



CCG GCG CTC ATC CGT CGA TAT GAG GAA GCC AAA GOT GGT GGT GCA GAA 672 

Pro Ala Leu He Arg Arg Tyr Glu Glu Ala Lys Ala Gly Gly Ala Glu 
550 555 560 

GAG GTG ACG AAT TGG GGG ACC GGT ACT CCG CGG CGC GAA CTT CTG CAT 720 

Glu Val Thr Asn Trp Gly Thr Gly Thr Pro Arg Arg Glu Leu Leu His 

565 570 575 580 

GTC GAG GAT CTG GCG AGC GCA TGC CTG TTC CTT TTG GAA CAT TTC GAT 768 

Val Asp Asp Leu Ala Ser Ala Cys Leu Phe Leu Leu Glu His Phe Asp 

585 590 595 

GGT CCG AAC CAC GTC AAC GTG GGC ACC GGC GTC GAT CAC AGC ATT AGC 816 

Gly Pro Asn His Val Asn Val Gly Thr Gly Val Asp His Ser He Ser 
600 605 610 

GAG ATC GCA GAC ATG GTC GCT ACG GCG GTG GGC TAC ATC GGC GAA ACA 864 

Glu He Ala Asp Met Val Ala Thr Ala Val Gly Tyr He Gly Glu Thr 
615 620 625 

CGT TGG GAT CCA ACT AAA CCC GAT GGA ACC CCG CGC AAA CTA TTG GAC 912 

Arg Trp Asp Pro Thr Lys Pro Asp Gly Thr Pro Arg Lys Leu Leu Asp 
630 635 640 

GTC TCC GCG CTA CGC GAG TTG GGT TGG CGC CCG CGA ATC GCA CTG AAA 960 

Val Ser Ala Leu Arg Glu Leu Gly Trp Arg Pro Arg He Ala Leu Lys 

645 650 655 660 

GAC GGC ATC GAT GCA ACG GTG TCG TGG TAC CGC ACA AAT GCC GAT GCC 1008 

Asp Gly He Asp Ala Thr Val Ser Trp Tyr Arg Thr Asn Ala Asp Ala 

665 670 675 

GTG AGG AGG TAA 1020 
Val Arg Arg * 
680 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 340 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



Val Arg Trp His Thr Met Asp Arg His Ala Asp Val Ala Trp Leu Gly 
15 10 15 

Arg Ser Lys Leu Thr Thr Thr Pro Gly Pro Leu Asp Arg Ala Thr Pro 
20 25 30 

Val Tyr He Ala Gly His Arg Gly Leu Val Gly Ser Ala Leu Val Arg 
35 40 45 

Arg Phe Glu Ala Glu Gly Phe Thr Asn Leu He Val Arg Ser Arg Asp 
50 55 60 

Glu He Asp Leu Thr Asp Arg Ala Ala Thr Phe Asp Phe Val Ser Glu 
65 70 75 80 

Thr Arg Pro Gin Val He He Asp Ala Ala Ala Arg Val Gly Gly He 
85 90 95 

Met Ala Asn Asn Thr Tyr Pro Ala Asp Phe Leu Ser Glu Asn Leu Arg 
100 105 110 

He Gin Thr Asn Leu Leu Asp Ala Ala Val Ala Val Arg Val Pro Arg 
115 120 125 

Leu Leu Phe Leu Gly Ser Ser Cys He Tyr Pro Lys Tyr Ala Pro Gin 
130 135 140 

Pro He His Glu Ser Ala Leu Leu Thr Gly Pro Leu Glu Pro Thr Asn 
145 150 155 160 

Asp Ala Tyr Ala He Ala Lys He Ala Gly He Leu Gin Val Gin Ala 
165 170 175 

Val Arg Arg Gin Tyr Gly Leu Ala Trp He Ser Ala Met Pro Thr Asn 
180 185 190 

Leu Tyr Gly Pro Gly Asp Asn Phe Ser Pro Ser Gly Ser His Leu Leu 
195 200 205 

Pro Ala Leu He Arg Arg Tyr Glu Glu Ala Lys Ala Gly Gly Ala Glu 
210 215 220 

Glu Val Thr Asn Trp Gly Thr Gly Thr Pro Arg Arg Glu Leu Leu His 
225 230 235 240 



Val Asp Asp Leu Ala Ser Ala Cys Leu Phe Leu Leu Glu His Phe Asp 
245 250 255 



Gly Pro Asn His Val Asn Val Gly Thr Gly Val Asp His Ser He Ser 
260 265 270 



Glu He Ala Asp Met Val Ala Thr Ala Val Gly Tyr He Gly Glu Thr 
275 280 285 

Arg Trp Asp Pro Thr Lys Pro Asp Gly Thr Pro Arg Lys Leu Leu Asp 
290 295 300 

Val Ser Ala Leu Arg Glu Leu Gly Trp Arg Pro Arg He Ala Leu Lys 
305 310 315 320 

Asp Gly He Asp Ala Thr Val Ser Trp Tyr Arg Thr Asn Ala Asp Ala 
325 330 335 

Val Arg Arg * 
340 

(2) INFORMATION FOR SEQ ID NO: 17: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 723 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .720 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

ATG GAT TTT TTG CGC AAC GCC GGC TTG ATG GCT CGT AAC GTT AGT ACC 48 

Met Asp Phe Leu Arg Asn Ala Gly Leu Met Ala Arg Asn Val Ser Thr 

345 350 355 

GAG ATG CTG CGC CAC TTC GAA CGA AAG CGC CTA TTA GTA AAC CAA TTC 96 

Glu Met Leu Arg His Phe Glu Arg Lys Arg Leu Leu Val Asn Gin Phe 
360 365 370 



AAA GCA TAC GGA GTC AAC GTT GTT ATT GAT GTC GGT GCT AAC TCC GGC 
Lys Ala Tyr Gly Val Asn Val Val He Asp Val Gly Ala Asn Ser Gly 
375 380 385 



144 



CAG TTC GGT AGC GCT TTG CGT CGT GCA GGA TTC AAG AGC CGT ATC GTT 192 

Gin Phe Gly Ser Ala Leu Arg Arg Ala Gly Phe Lys Ser Arg He Val 
390 395 400 

TCC TTT GAA CCT CTT TCG GGG CCA TTT GCG CAA CTA ACG CGC AAG TCG 240 

Ser Phe Glu Pro Leu Ser Gly Pro Phe Ala Gin Leu Thr Arg Lys Ser 

405 410 415 420 

GCA TCG GAT CCA CTA TGG GAG TGT CAC CAG TAT GCC CTA GGC GAC GCC 288 

Ala Ser Asp Pro Leu Trp Glu Cys His Gin Tyr Ala Leu Gly Asp Ala 
425 430 435 

GAT GAG ACG ATT ACC ATC AAT GTG GCA GGC AAT GCG GGG GCA AGT AGT 336 

Asp Glu Thr He Thr He Asn Val Ala Gly Asn Ala Gly Ala Ser Ser 
440 445 450 

TCC GTG CTG CCG ATG CTT AAA AGT CAT CAA GAT GCC TTT CCT CCC GCG 384 

Ser Val Leu Pro Met Leu Lys Ser His Gin Asp Ala Phe Pro Pro Ala 
455 460 465 

AAT TAT ATT GGC ACC GAA GAC GTT GCA ATA CAC CGC CTT GAT TCG GTT 432 

Asn Tyr He Gly Thr Glu Asp Val Ala He His Arg Leu Asp Ser Val 
470 475 480 

GCA TCA GAA TTT CTG AAC CCT ACC GAT GTT ACT TTC CTG AAG ATC GAC 480 

Ala Ser Glu Phe Leu Asn Pro Thr Asp Val Thr Phe Leu Lys He Asp 

485 490 495 500 

GTA CAG GGT TTC GAG AAG CAG GTT ATC ACG GGC AGT AAG TCA ACG CTT 528 

Val Gin Gly Phe Glu Lys Gin Val He Thr Gly Ser Lys Ser Thr Leu 
505 510 515 

AAC GAA AGC TGC GTC SGC ATG CAA CTC GAA CTT TCT TTT ATT CCG TTG 576 

Asn Glu Ser Cys Val Gly Met Gin Leu Glu Leu Ser Phe He Pro Leu 
520 525 530 

TAC GAA GGT GAC ATG CTG ATT CAT GAA GCG CTT GAA CTT GTC TAT TCC 624 

Tyr Glu Gly Asp Met Leu He His Glu Ala Leu Glu Leu Val Tyr Ser 
535 540 545 

CTA GGT TTC AGA CTG ACG GGT TTG TTG CCC GGC TTT ACG GAT CCG CGC 672 

Leu Gly Phe Arg Leu Thr Gly Leu Leu Pro Gly Phe Thr Asp Pro Arg 
550 555 560 

AAT GGT CGA ATG CTT CAA GCT GAC GGC ATT TTC TTC CGT GGG GAC GAT 720 

Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg Gly Asp Asp 

565 570 575 580 



TGA 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Phe Leu Arg Asn Ala Gly Leu Met Ala Arg Asn Val Ser Thr 
15 10 15 

Glu Met Leu Arg His Phe Glu Arg Lys Arg Leu Leu Val Asn Gin Phe 
20 25 30 

Lys Ala Tyr Gly Val Asn Val Val He Asp Val Gly Ala Asn Ser Gly 
35 40 45 

Gin Phe Gly Ser Ala Leu Arg Arg Ala Gly Phe Lys Ser Arg lie Val 
50 55 60 

Ser Phe Glu Pro Leu Ser Gly Pro Phe Ala Gin Leu Thr Arg Lys Ser 
65 70 75 80 

Ala Ser Asp Pro Leu Trp Glu Cys His Gin Tyr Ala Leu Gly Asp Ala 
85 90 95 

Asp Glu Thr He Thr lie Asn Val Ala Gly Asn Ala Gly Ala Ser Ser 
100 105 110 

Ser Val Leu Pro Met Leu Lys Ser His Gin Asp Ala Phe Pro Pro Ala 
115 120 125 

Asn Tyr He Gly Thr Glu Asp Val Ala He His Arg Leu Asp Ser Val 
130 135 140 

Ala Ser Glu Phe Leu Asn Pro Thr Asp Val Thr Phe Leu Lys He Asp 
145 150 155 160 

Val Gin Gly Phe Glu Lys Gin Val He Thr Gly Ser Lys Ser Thr Leu 
165 170 175 



Asn Glu Ser Cys Val Gly Met Gin Leu Glu Leu Ser Phe He Pro Leu 
180 185 190 



Tyr Glu Gly Asp Met Leu He His Glu Ala Leu Glu Leu Val Tyr Ser 
195 200 205 



Leu Gly Phe Arg Leu Thr Gly Leu Leu Pro Gly Phe Thr Asp Pro Arg 

210 215 220 

Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg Gly Asp Asp 

225 230 235 240 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 723 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .720 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATG GAT TTT TTG CGC AAC GCC GGC TIG ATG GCT CGT AAC GTT AGC ACC 48 
Met Asp Phe Leu Arg Asn Ala Gly Leu Met Ala Arg Asn Val Ser Thr 
245 250 255 

GAG ATG CTG CGC CAC TTC GAA CGA AAG CGC CTA TTA GTA AAC CAA TTC 96 
Glu Met Leu Arg His Phe Glu Arg Lys Arg Leu Leu Val Asn Gin Phe 
260 265 270 

AAA GCA TAC GGA GTC AAC GTT GTT ATT GAT GTC GGT GCT AAC TCC GGC 144 
Lys Ala Tyr Gly Val Asn Val Val He Asp Val Gly Ala Asn Ser Gly 
275 280 285 

CAG TTC GGT AGC GCT TTG CGT CGT GCA GGA TTC AAG AGC CGT ATC GTT 192 
Gin Phe Gly Ser Ala Leu Arg Arg Ala Gly Phe Lys Ser Arg He Val 
290 295 300 

TCC TTT GAA CCT CTT TCG GGG CCA TTT GCG CAA CTA ACG CGC GAG TCG 240 
Ser Phe Glu Pro Leu Ser Gly Pro Phe Ala Gin Leu Thr Arg Glu Ser 
305 310 315 320 



GCA TCG GAT CCA CTA TGG GAG TGT CAC CAG TAT GCC CTA GGC GAG GCC 288 

Ala Ser Asp Pro Leu Trp Gl u Cys His Gin Tyr Ala Leu Gly Asp Ala 

325 330 335 

GAT GAG ACG ATT ACC ATC AAT GTG GCA GGC AAT GCG GGG GCA AGT AGT 336 

Asp Glu Thr He Thr He Asn Val Ala Gly Asn Ala Gly Ala Ser Ser 
340 345 350 

TCC GTG CTG CCG ATG CTT AAA AGT CAT CAA GAT GCC TTT CCT CCC GCG 384 

Ser Val Leu Pro Met Leu Lys Ser His Gin Asp Ala Phe Pro Pro Ala 
355 360 365 

AAT TAT ATT GGC ACC GAA GAC GTT GCA ATA CAC CGC CTT GAT TCG GTT 432 

Asn Tyr He Gly Thr Glu Asp Val Ala He His Arg Leu Asp Ser Val 
370 375 380 

GCA TCA GAA TTT CTG AAC CCT ACC GAT GTT ACT TTC CTG AAG ATC GAC 480 

Ala Ser Glu Phe Leu Asn Pro Thr Asp Val Thr Phe Leu Lys He Asp 
385 390 395 400 

GTA CAG GGT TTC GAG AAG CAG GTT ATC GCG GGC AGT AAG TCA ACG CTT 528 

Val Gin Gly Phe Glu Lys Gin Val He Ala Gly Ser Lys Ser Thr Leu 

405 410 415 

AAC GAA AGC TGC GTC GGC ATG CAA CTC GAA CTT TCT TTT ATT CCG TTG 575 

Asn Glu Ser Cys Val Gly Met Gin Leu Glu Leu Ser Phe He Pro Leu 
420 425 430 

TAC GAA GGT GAC ATG CTG ATT CAT GAA GCG CTT GAA CTT GTC TAT TCC 624 

Tyr Glu Gly Asp Met Leu He His Glu Ala Leu Glu Leu Val Tyr Ser 
435 440 445 

CTA GGT TTC AGA CTG ACG GGT TTG TTG CCC GGA TTT ACG GAT CCG CGC 672 

Leu Gly Phe Arg Leu Thr Gly Leu Leu Pro Gly Phe Thr Asp Pro Arg 
450 455 460 

AAT GGT CGA ATG CTT CAA GCT GAC GGC ATT TTC TTC CGT GGG GAC GAT 720 

Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg Gly Asp Asp 
465 470 475 480 

TGA 723 



(2) INFORMATION FOR SEQ ID NO: 20: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 amino acids 

(B) TYPE: amino acid 



(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Met Asp Phe Leu Arg Asn Ala Gly Leu Met Ala Arg Asn Val Ser Thr 
15 10 15 

Glu Met Leu Arg His Phe Glu Arg Lys Arg Leu Leu Val Asn Gin Phe 
20 25 30 

Lys Ala Tyr Gly Val Asn Val Val He Asp Val Gly Ala Asn Ser Gly 
35 40 45 

Gin Phe Gly Ser Ala Leu Arg Arg Ala Gly Phe Lys Ser Arg lie Val 
50 55 60 

Ser Phe Glu Pro Leu Ser Gly Pro Phe Ala Gin Leu Thr Arg Glu Ser 
65 70 75 80 

Ala Ser Asp Pro Leu Trp Glu Cys His Gin Tyr Ala Leu Gly Asp Ala 
85 90 95 

Asp Glu Thr He Thr He Asn Val Ala Gly Asn Ala Gly Ala Ser Ser 
100 105 110 

Ser Val Leu Pro Met Leu Lys Ser His Gin Asp Ala Phe Pro Pro Ala 
115 120 125 

Asn Tyr He Gly Thr Glu Asp Val Ala He His Arg Leu Asp Ser Val 
130 135 140 

Ala Ser Glu Phe Leu Asn Pro Thr Asp Val Thr Phe Leu Lys He Asp 
145 150 155 160 

Val Gin Gly Phe Glu Lys Gin Val He Ala Gly Ser Lys Ser Thr Leu 
165 170 175 

Asn Glu Ser Cys Val Gly Met Gin Leu Glu Leu Ser Phe He Pro Leu 
180 185 190 

Tyr Glu Gly Asp Met Leu He His Glu Ala Leu Glu Leu Val Tyr Ser 
195 200 205 



Leu Gly Phe Arg Leu Thr Gly Leu Leu Pro Gly Phe Thr Asp Pro Arg 
210 215 220 



Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg Gly Asp Asp 
225 230 235 240 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 801 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION :1. .798 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

ATG ACT GCG CCA GTG TTC TCG ATA ATT ATC CCT ACC TTC AAT GCA GCG 48 

Met Thr Ala Pro Val Phe Ser He He He Pro Thr Phe Asn Ala Ala 

245 250 255 

GTG ACG CTG CAA GCC TGC CTC GGA AGC ATC GTC GGG CAG ACC TAC CGG 96 

Val Thr Leu Gin Ala Cys Leu Gly Ser He Val Gly Gin Thr Tyr Arg 

260 265 270 

GAA GTG GAA GTG GTC CTT GTC GAC GGC GGT TCG ACC GAT CGG ACC CTC 144 

Glu Val Glu Val Val Leu Val Asp Gly Gly Ser Thr Asp Arg Thr Leu 

275 280 285 

GAC ATC GCG AAC AGT TTC CGC CCG GAA CTC GGC TCG CGA CTG GTC GTT 192 

Asp He Ala Asn Ser Phe Arg Pro Glu Leu Gly Ser Arg Leu Val Val 
290 295 300 

CAC AGC GGG CCC GAT GAT GGC CCC TAC GAC GCC ATG AAC CGC GGC GTC 240 

His Ser Gly Pro Asp Asp Gly Pro Tyr Asp Ala Met Asn Arg Gly Val 

305 310 315 320 

GGC GTG GCC ACA GGC GAA TGG GTA CTT TTT TTA GGC GCC GAC GAC ACC 288 

Gly Val Ala Thr Gly Glu Trp Val Leu Phe Leu Gly Ala Asp Asp Thr 

325 330 335 

CTC TAC GAA CCA ACC ACG TTG GCC CAG GTA GCC GCT TTT CTC GGC GAC 336 

Leu Tyr Glu Pro Thr Thr Leu Ala Gin Val Ala Ala Phe Leu Gly Asp 



340 



345 



350 



CAT GCG GCA AGC CAT CTT GTC TAT GGC GAT GTT GTG ATG CGT TOG ACG 384 
His Ala Ala Ser His Leu Val Tyr Gly Asp Val Val Met Arg Ser Thr 
355 360 365 

AAA AGC CGG CAT GCC GGA CCT TTC GAC CTC GAC CGC CTC CTA TTT GAG 432 
Lys Ser Arg His Ala Gly Pro Phe Asp Leu Asp Arg Leu Leu Phe Glu 
370 375 380 

ACG AAT TTG TGC CAC CAA TCG ATC TTT TAC CGC CGT GAG CTT TTC GAC 480 
Thr Asn Leu Cys His Gin Ser He Phe Tyr Arg Arg Glu Leu Phe Asp 
385 390 395 400 

GGC ATC GGC CCT TAC AAC CTG CGC TAC CGA GTC TGG GCG GAC TGG GAC 528 
Gly He Gly Pro Tyr Asn Leu Arg Tyr Arg Val Trp Ala Asp Trp Asp 
405 410 415 

TTC AAT ATT CGC TGC TTC TCC AAC CCG GCG CTG ATT ACC CGC TAC ATG 576 
Phe Asn He Arg Cys Phe Ser Asn Pro Ala Leu He Thr Arg Tyr Met 
420 425 430 

GAC GTC GTG ATT TCC GAA TAC AAC GAC ATG ACC GGC TTC AGC ATG AGG 624 
Asp Val Val He Ser Glu Tyr Asn Asp Met Thr Gly Phe Ser Met Arg 
435 440 445 

CAG GGG ACT GAT AAA GAG TTC AGA AAA CGG CTG CCA ATG TAC TTC TGG 672 
Gin Gly Thr Asp Lys Glu Phe Arg Lys Arg Leu Pro Met Tyr Phe Trp 
450 455 460 

GTT GCA GGG TGG GAG ACT TGC AGG CGC ATG CTG GCG TTT TTG AAA GAC 720 
Val Ala Gly Trp Glu Thr Cys Arg Arg Met Leu Ala Phe Leu Lys Asp 
465 470 475 480 

AAG GAG AAT CGC CGT CTG GCC TTG CGT ACG CGG TTG ATA AGG GTT AAG 768 
Lys Glu Asn Arg Arg Leu Ala Leu Arg Thr Arg Leu He Arg Val Lys 
485 490 495 



GCC GTC TCC AAA GAA CGA AGC GCA GAA CCG TAG 
Ala Val Ser Lys Glu Arg Ser Ala Glu Pro 
500 505 



801 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 266 amino acids 

(B) TYPE: amino acid 



(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Thr Ala Pro Val Phe Ser He He He Pro Thr Phe Asn Ala Ala 
15 10 15 

Val Thr Leu Gin Ala Cys Leu Gly Ser He Val Gly Gin Thr Tyr Arg 
20 25 30 

Glu Val Glu Val Val Leu Val Asp Gly Gly Ser Thr Asp Arg Thr Leu 
35 40 45 

Asp He Ala Asn Ser Phe Arg Pro Glu Leu Gly Ser Arg Leu Val Val 
50 55 60 

His Ser Gly Pro Asp Asp Gly Pro Tyr Asp Ala Met Asn Arg Gly Val 
65 70 75 80 

Gly Val Ala Thr Gly Glu Trp Val Leu Phe Leu Gly Ala Asp Asp Thr 
85 90 95 

Leu Tyr Glu Pro Thr Thr Leu Ala Gin Val Ala Ala Phe Leu Gly Asp 
100 105 110 

His Ala Ala Ser His Leu Val Tyr Gly Asp Val Val Met Arg Ser Thr 
115 120 125 

Lys Ser Arg His Ala Gly Pro Phe Asp Leu Asp Arg Leu Leu Phe Glu 
130 135 140 

Thr Asn Leu Cys His Gin Ser He Phe Tyr Arg Arg Glu Leu Phe Asp 
145 150 155 160 

Gly He Gly Pro Tyr Asn Leu Arg Tyr Arg Val Trp Ala Asp Trp Asp 
165 170 175 

Phe Asn He Arg Cys Phe Ser Asn Pro Ala Leu He Thr Arg Tyr Met 
180 185 190 

Asp Val Val He Ser Glu Tyr Asn Asp Met Thr Gly Phe Ser Met Arg 
195 200 205 

Gin Gly Thr Asp Lys Glu Phe Arg Lys Arg Leu Pro Met Tyr Phe Trp 
210 215 220 



Val Ala Gly Trp Glu Thr Cys Arg Arg Met Leu Ala Phe Leu Lys Asp 



225 



230 



235 



240 



Lys Glu Asn Arg Arg Leu Ala Leu Arg Thr Arg Leu He Arg Val Lys 
245 250 255 

Ala Val Ser Lys Glu Arg Ser Ala Glu Pro 
260 265 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 801 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!.. 798 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

ATG ACT GCG CCA GTG TTC TCG ATA ATT ATC CCT ACC TTC AAT GCA GCG 48 
Met Thr Ala Pro Val Phe Ser He He He Pro Thr Phe Asn Ala Ala 
270 275 280 

GTG ACG CTG CAA GCC TGC CTC GGA AGC ATC GTC GGG CAG ACC TAC CGG, 96 
Val Thr Leu Gin Ala Cys Leu Gly Ser He Val Gly Gin Thr Tyr Arg 
285 290 295 

GAA GTG GAA GTG GTC CTT GTC GAC GGC GGT TCG ACC GAT CGG ACC CTC 144 
Glu Val Glu Val Val Leu Val Asp Gly Gly Ser Thr Asp Arg Thr Leu 
300 305 310 

GAC ATC GCG AAC AGT TTC CGC CCG GAA CTC GGC TCG CGA CTG GTC GTT 192 
Asp He Ala Asn Ser Phe Arg Pro Glu Leu Gly Ser Arg Leu Val Val 
315 320 325 330 

CAC AGC GGG CCC GAT GAT GGC CCC TAC GAC GCC ATG AAC CGC GGC GTC 240 
His Ser Gly Pro Asp Asp Gly Pro Tyr Asp Ala Met Asn Arg Gly Val 
335 340 345 

GGC GTA GCC ACA GGC GAA TGG GTA CTT TTT TTA GGC GCC GAC GAC ACC 288 
Gly Val Ala Thr Gly Glu Trp Val Leu Phe Leu Gly Ala Asp Asp Thr 



350 



355 



360 



CTC TAC GAA CCA ACC ACG TTG GCC CAG GTA GCC GCT TTT CTC GGC GAC 336 
Leu Tyr Glu Pro Thr Thr Leu Ala Gin Val Ala Ala Phe Leu Gly Asp 
365 370 375 

CAT GCG GCA AGC CAT CTT GTC TAT GGC GAT GTT GTG ATG CGT TCG ACG 384 
His Ala Ala Ser His Leu Val Tyr Gly Asp Val Val Met Arg Ser Thr 
380 385 390 

AAA AGC CGG CAT GCC GGA CCT TTC GAC CTC GAC CGC CTC CTA TTT GAG 432 
Lys Ser Arg His Ala Gly Pro Phe Asp Leu Asp Arg Leu Leu Phe Glu 
395 400 405 410 

ACG AAT TTG TGC CAC CAA TCG ATC TTT TAC CGC CGT GAG CTT TTC GAC 480 
Thr Asn Leu Cys His Gin Ser He Phe Tyr Arg Arg Glu Leu Phe Asp 
415 420 425 

GGC ATC GGC CCT TAC AAC CTG CGC TAC CGA GTC TGG GCG GAC TGG GAC 528 
Gly lie Gly Pro Tyr Asn Leu Arg Tyr Arg Val Trp Ala Asp Trp Asp 
430 435 440 

TTC AAT ATT CGC TGC TTC TCC AAC CCG GCG CTG ATT ACC CGC TAC ATG 576 
Phe Asn lie Arg Cys Phe Ser Asn Pro Ala Leu He Thr Arg Tyr Met 
445 450 455 

GAC GTC GTG ATT TCC GAA TAC AAC GAC ATG ACC GGC TTC AGC ATG AGG 624 
Asp Val Val He Ser Glu Tyr Asn Asp Met Thr Gly Phe Ser Met Arg 
460 465 470 

CAG GGG ACT GAT AAA GAG TTC AGA AAA CGG CTG CCA ATG TAC TTC TGG 672 
Gin Gly Thr Asp Lys Glu Phe Arg Lys Arg Leu Pro Met Tyr Phe Trp 
475 480 485 490 

GTT GCA GGG TGG GAG ACT TGC AGG CGC ATG CTG GCG TTT TTG AAA GAC 720 
Val Ala Gly Trp Glu Thr Cys Arg Arg Met Leu Ala Phe Leu Lys Asp 
495 500 505 

AAG GAG AAT CGC CGT CTG GCC TTG CGT ACG CGG TTG ATA AGG GTT AAG 768 
Lys Glu Asn Arg Arg Leu Ala Leu Arg Thr Arg Leu He Arg Val Lys 
510 515 520 

GCC GTC TCC AAA GAA CGA AGC GCA GAA CCG TAG 801 
Ala Val Ser Lys Glu Arg Ser Ala Glu Pro 
525 530 



(2) INFORMATION FOR SEQ ID NO: 24: 



(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 266 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Thr Ala Pro Val Phe Ser He He He Pro Thr Phe Asn Ala Ala 
15 10 15 

Val Thr Leu Gin Ala Cys Leu Gly Ser He Val Gly Gin Thr Tyr Arg 
20 25 30 

Glu Val Glu Val Val Leu Val Asp Gly Gly Ser Thr Asp Arg Thr Leu 
35 40 45 

Asp He Ala Asn Ser Phe Arg Pro Glu Leu Gly Ser Arg Leu Val Val 
50 55 60 

His Ser Gly Pro Asp Asp Gly Pro Tyr Asp Ala Met Asn Arg Gly Val 
65 70 75 80 

Gly Val Ala Thr Gly Glu Trp Val Leu Phe Leu Gly Ala Asp Asp Thr 
85 90 95 

Leu Tyr Glu Pro Thr Thr Leu Ala Gin Val Ala Ala Phe Leu Gly Asp 
100 105 110 

His Ala Ala Ser His Leu Val Tyr Gly Asp Val Val Met Arg Ser Thr 
115 120 125 

Lys Ser Arg His Ala Gly Pro Phe Asp Leu Asp Arg Leu Leu Phe Glu 
130 135 140 

Thr Asn Leu Cys His Gin Ser He Phe Tyr Arg Arg Glu Leu Phe Asp 
145 150 155 160 

Gly He Gly Pro Tyr Asn Leu Arg Tyr Arg Val Trp Ala Asp Trp Asp 
165 170 175 

Phe Asn He Arg Cys Phe Ser Asn Pro Ala Leu He Thr Arg Tyr Met 
180 185 190 

Asp Val Val He Ser Glu Tyr Asn Asp Met Thr Gly Phe Ser Met Arg 
195 200 205 



Gin Gly Thr Asp Lys Glu Phe Arg Lys Arg Leu Pro Met Tyr Phe Trp 



210 215 220 

Val A1a Gly Trp Glu Thr Cys Arg Arg Met Leu Ala Phe Leu Lys Asp 
225 230 235 240 

Lys Glu Asn Arg Arg Leu Ala Leu Arg Thr Arg Leu He Arg Va] Lys 
245 250 255 

Ala Val Ser Lys Glu Arg Ser Ala Glu Pro 
260 265 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 867 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .864 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GTG 6CC AGC AGA AGT CCC CAC TCC GOT GCG 6GT GGT TGG CTA ATT CTT 48 

Val Ala Ser Arg Ser Pro His Ser Ala Ala Gly Gly Trp Leu He Leu 
270 275 280 

GGC GGC TCC CTT CTT GTG GTC GGC GTG GCG CAT CCG GTA GGA CTC GCC 96 

Gly Gly Ser Leu Leu Val Val Gly Val Ala His Pro Val Gly Leu Ala 
285 290 295 

GGA GGT GAC GAC GAT GCT GGC GTG GTG CAG CAG CCG ATC GAG GAT GCT 144 

Gly Gly Asp Asp Asp Ala Gly Val Val Gin Gin Pro He Glu Asp Ala 
300 305 310 

GGC GGC GGT GGT GTG CTC GGG CAG GAA TCG CCC CCA TTG TTC GAA GGG 192 

Gly Gly Gly Gly Val Leu Gly Gin Glu Ser Pro Pro Leu Phe Glu Gly 

315 320 325 330 

CCA ATG CGA GGC GAT GGC CAG GGA GCG GCG CTC GTA GCC GGC AGC CAC 240 

Pro Met Arg Gly Asp Gly Gin Gly Ala Ala Leu Val Ala Gly Ser His 
335 340 345 



GAG CCG GAA CAA CAG TTG AGT CCC GGT GTC GTC GAG CGG GGC GAA GCC 288 
Glu Pro Glu Gin Gin Leu Ser Pro Gly Val Val Glu Arg Gly Glu Ala 
350 355 360 

GAT CTC GTC CAA GAT GAC CAG ATC CGC GCG GAG CAG GGT GTC GAT GAT 336 
Asp Leu Val Gin Asp Asp Gin He Arg Ala Glu Gin Gly Val Asp Asp 
365 370 375 

CTT GCC GAC GGT GTT GTC GGC CAG GCC GCG GTA GAG GAC CTC GAT CAG 384 
Leu Ala Asp Gly Val Val Gly Gin Ala Ala Val Glu Asp Leu Asp Gin 
380 385 390 

GTC GGC GGC GGT GAA GTA GCG GAC TTT GAA TCC GGC GTG GAC GGC AGC 432 
Val Gly Gly Gly Glu Val Ala Asp Phe Glu Ser Gly Val Asp Gly Ser 
395 400 405 410 

GTG CCC GCA GCC GAT GAG CAG GTG ACT TTT GCC CGT ACC AGG TGG GCC 480 
Val Pro Ala Ala Asp Glu Gin Val Thr Phe Ala Arg Thr Arg Trp Ala 
415 420 425 

AAT GAC CGC CAG GTT CTG TTG TGC CCG AAT CCA TTC CAG GCT CGA CAG 528 
Asn Asp Arg Gin Val Leu Leu Cys Pro Asn Pro Phe Gin Ala Arg Gin 
430 435 440 

GTA GTC GAA CGT GGC TGC GGT GAT CGA CGA TCC GGT GAC GTC GAA CCC 575 
Val Val Glu Arg Gly Cys Gly Asp Arg Arg Ser Gly Asp Val Glu Pro 
445 450 455 

GTC GAG GGT CTT GGT GAC CGG GAA GGC TGC GGC CTT GAG ACG GTT GGC 624 
Val Glu Gly Leu Gly Asp Arg Glu Gly Cys Gly Leu Glu Thr Val Gly 
460 465 470 

GGT GTT GGA GGC ATC GCG GGC AGC GAT CTC GGC CTC AAC CAA CGT CCG 672 
Gly Val Gly Gly He Ala Gly Ser Asp Leu Gly Leu Asn Gin Arg Pro 
475 480 485 490 

CAG GAT CTC CTC CGG TGT CCA GCG TTG CGT CTT GGC GAC TTG CAA CAC 720 
Gin Asp Leu Leu Arg Cys Pro Ala Leu Arg Leu Gly Asp Leu Gin His 
495 500 505 

CTC GGC GGC GTT GCG GCG CAC CGT GGC CAG CTT CAA CCG CCG CAG CGC 768 
Leu Gly Gly Val Ala Ala His Arg Gly Gin Leu Gin Pro Pro Gin Arg 
510 515 520 

CGC GTC AAG GTC AGC AGC CAG CGG TGC CGC CGA GGA CGG TGC CAC CGG 816 
Arg Val Lys Val Ser Ser Gin Arg Cys Arg Arg Gly Arg Cys His Arg 
525 530 535 



CTT GGC AGC GGT GGT CAT GAG GCC GTC CCG TCG GTG GTG TTG ATC TTG 864 
Leu Gly Ser Gly Gly His Glu Ala Val Pro Ser Val Val Leu lie Leu 
540 545 550 

TAG 867 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 288 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Val Ala Ser Arg Ser Pro His Ser Ala Ala Gly Gly Trp Leu He Leu 
15 10 15 

Gly Gly Ser Leu Leu Val Val Gly Val Ala His Pro Val Gly Leu Ala 
20 25 30 

Gly Gly Asp Asp Asp Ala Gly Val Val Gin Gin Pro He Glu Asp Ala 
35 40 45 

Gly Gly Gly Gly Val Leu Gly Gin Glu Ser Pro Pro Leu Phe Glu Gly 
50 55 60 

Pro Met Arg Gly Asp Gly Gin Gly Ala Ala Leu Val Ala Gly Ser His 
65 70 75 80 

Glu Pro Glu Gin Gin Leu Ser Pro Gly Val Val Glu Arg Gly Glu Ala 
85 90 95 

Asp Leu Val Gin Asp Asp Gin He Arg Ala Glu Gin Gly Val Asp Asp 
100 105 110 

Leu Ala Asp Gly Val Val Gly Gin Ala Ala Val Glu Asp Leu Asp Gin 
115 120 125 

Val Gly Gly Gly Glu Val Ala Asp Phe Glu Ser Gly Val Asp Gly Ser 
130 135 140 

Val Pro Ala Ala Asp Glu Gin Val Thr Phe Ala Arg Thr Arg Trp Ala 
145 150 155 160 



Asn Asp Arg Gin Val Leu Leu Cys Pro Asn Pro Phe Gin Ala Arg Gin 



165 



170 



175 



Val Val Glu Arg Gly Cys Gly Asp Arg Arg Ser Gly Asp Val Glu Pro 
180 185 190 

Val Glu Gly Leu Gly Asp Arg Glu Gly Cys Gly Leu Glu Thr Val Gly 
195 200 205 

Gly Val Gly Gly He Ala Gly Ser Asp Leu Gly Leu Asn Gin Arg Pro 
210 215 220 

Gin Asp Leu Leu Arg Cys Pro Ala Leu Arg Leu Gly Asp Leu Gin His 
225 230 235 240 

Leu Gly Gly Val Ala Ala His Arg Gly Gin Leu Gin Pro Pro Gin Arg 
245 250 255 

Arg Val Lys Val Ser Ser Gin Arg Cys Arg Arg Gly Arg Cys His Arg 
260 265 270 

Leu Gly Ser Gly Gly His Glu Ala Val Pro Ser Val Val Leu lie Leu 
275 280 285 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1739 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .945 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 945. .1736 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ATGGGCTGCC TCAAAGGTGG TGTCGTCGCC AATGTTGTTG TTCCAACACC 6GATTATGTG 



CGATTCGCGT CCCACTATGG CTTCGTTCCG GACTTCTGCC ACGGTGCGGA TCCGCAATCG 120 

AAGGGCATCG TGGAGAACCT CTGTGGCTAC GCTCAGGACG ACCTTGCGGT GCCGCTGCTG 180 

ACCGAAGCTG CGTTAGCCGG TGAGCAGGTC GACCTACGTG CCCTCAACGC CCAGGCGCAA 240 

CTATGGTGCG CCGAGGTCAA TGCCACGGTC CACTCGGAGA TCTGCGCCGT GCCCAACGAT 300 

CGCTTGGTTG ACGAGCGCAC CGTCTTGAGG GAGCTGCCCT CGCTGCGGCC GACGATCGGC 360 

TCGGGGTCGG TGCGCCGTAA GGTCGACGGC CTCTCGTGCA TCCGTTACGG CTCAGCTCGT 420 

TACTCGGTGC CTCAGCGGCT CGTCGGTGCC ACCGTGGCGG TGGTGGTCGA TCATGGCGCC 480 

CTGATCCTGT TGGAACCTGC GACCGGTGTG ATCGTGGCCG AGCACGAGCT CGTCAGCCCA 540 

GGTGAGGTGT CCATCCTCGA TGAACACTAC GACGGACCCA 6ACCCGCACC CTCGCGTGGT 600 

CCTCGCCCGA AAACCCAAGC AGAGAAACGA TTCTGCGCAT TGGGAACCGA AGCGCAGCAG 660 

TTCCTCGTCG GTGCTGCTGC GATCGGCAAC ACCCGACTGA AATCCGAACT CGACATTCTG 720 

CTCGGCCTTG GCGCCGCCCA CGGCGAACAG GCTTTGATTG ACGCGCTGCG CCGGGCGGTT 780 

GCGTTTCGCC G6TTCCGCGC TGCCGACGTG CGCTCGATCC TGGCCGCCGG CGCCGGCACC 840 

CCACAACCCC GCCCCGCCGG CGACGCACTC GTGCTCGATC TGCCCACCGT CGAGACCCGC 900 

TCGTTGGAGG CCTACAAGAT CAACACCACC GACGGGACG6 CCTCATGACC ACCGCTGCCA 960 

AGCCGGTGGC ACCGTCCTCG GCGGCACCGC TGGCTGCTGA CCTTGACGCG GCGCTGCGGC 1020 

GGTTGAAGCT GGCCACGGTG CGCCGCAACG CCGCCGAGGT GTTGCAAGTC GCCAAGACGC 1080 

AAC6CTGGAC ACCGGAGGAG ATCCTGCGGA CGTTGGTTGA GGCCGAGATC GCTGCCCGCG 1140 

ATGCCTCCAA CACCGCCAAC CGTCTCAAGG CCGCAGCCTT CCCGGTCACC AAGACCCTCG 1200 

ACGGGTTCGA CGTCACCGGA TCGTCGATCA CCGCAGCCAC GTTCGACTAC CTGTCGAGCC 1260 

TGGAATGGAT TCGGGCACAA CAGAACCTGG CGGTCATTGG CCCACCTGGT ACGGGCAAAA 1320 

GTCACCTGCT CATCGGCTGC GGGCACGCTG CCGTCCACGC CGGATTCAAA GTCCGCTACT 1380 

TCACCGCCGC CGACCTGATC GAGGTCCTCT ACCGCGGCCT GGCCGACAAC ACCGTCGGCA 1440 

AGATCATCGA CACCCTGCTC CGCGCGGATC TGGTCATCTT GGACGAGATC GGCTTCGCCC 1500 



CGCTCGACGA CACCGGGACT CAACTGTTGT TCCGGCTCGT GGCTGCCGGC TACGAGCGCC 1560 

GCTCCCT6GC CATCGCCTCG CATTGGCCCT TCGAACAATG GGGGCGATTC CTGCCCGAGC 1620 

ACACCACCGC CGCCAGCATC CTCGATCGGC TGCTGCACCA CGCCAGCATC GTCGTCACCT 1680 

CCGGCGAGTC CTACCGGATG CGCCACGCCG ACCACAAGAA GGGAGCCGCC AAGAATTAG 1739 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 315 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Gly Cys Leu Lys Gly Gly Val Val Ala Asn Val Val Val Pro Thr 
15 10 15 

Pro Asp Tyr Val Arg Phe Ala Ser His Tyr Gly Phe Val Pro Asp Phe 
20 25 30 

Cys His Gly Ala Asp Pro Gin Ser Lys Gly He Val Glu Asn Leu Cys 
35 40 45 

Gly Tyr Ala Gin Asp Asp Leu Ala Val Pro Leu Leu Thr Glu Ala Ala 
50 55 60 

Leu Ala Gly Glu Gin Val Asp Leu Arg Ala Leu Asn Ala Gin Ala Gin 
65 70 75 80 

Leu Trp Cys Ala Glu Val Asn Ala Thr Val His Ser Glu He Cys Ala 
85 90 95 

Val Pro Asn Asp Arg Leu Val Asp Glu Arg Thr Val Leu Arg Glu Leu 
100 105 110 

Pro Ser Leu Arg Pro Thr He Gly Ser Gly Ser Val Arg Arg Lys Val 
115 120 125 

Asp Gly Leu Ser Cys He Arg Tyr Gly Ser Ala Arg Tyr Ser Val Pro 
130 135 140 



Gin Arg Leu Val Gly Ala Thr Val Ala Val Val Val Asp His Gly Ala 
145 150 155 160 



Leu He Leu Leu Glu Pro Ala Thr Gly Val He Val Ala Glu His Glu 
165 170 175 



Leu Val Ser Pro Gly Glu Val Ser He Leu Asp Glu His Tyr Asp Gly 
180 185 190 

Pro Arg Pro Ala Pro Ser Arg Gly Pro Arg Pro Lys Thr Gin Ala Glu 
195 200 205 

Lys Arg Phe Cys Ala Leu Gly Thr Glu Ala Gin Gin Phe Leu Val Gly 
210 215 220 

Ala Ala Ala He Gly Asn Thr Arg Leu Lys Ser Glu Leu Asp He Leu 
225 230 235 240 

Leu Gly Leu Gly Ala Ala His Gly Glu Gin Ala Leu He Asp Ala Leu 
245 250 255 

Arg Arg Ala Val Ala Phe Arg Arg Phe Arg Ala Ala Asp Val Arg Ser 
260 265 270 

He Leu Ala Ala Gly Ala Gly Thr Pro Gin Pro Arg Pro Ala Gly Asp 
275 280 285 

Ala Leu Val Leu Asp Leu Pro Thr Val Glu Thr Arg Ser Leu Glu Ala 
290 295 300 

Tyr Lys He Asn Thr Thr Asp Gly Thr Ala Ser 
305 310 315 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 264 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Thr Thr Ala Ala Lys Pro Val Ala Pro Ser Ser Ala Ala Pro Leu 
15 10 15 

Ala Ala Asp Leu Asp Ala Ala Leu Arg Arg Leu Lys Leu Ala Thr Val 
20 25 30 

Arg Arg Asn Ala Ala Glu Val Leu Gin Val Ala Lys Thr Gin Arg Trp 



35 



40 



45 



Thr Pro Glu Glu He Leu Arg Thr Leu Val Glu Ala Glu He Ala Ala 
50 55 60 

Arg Asp Ala Ser Asn Thr Ala Asn Arg Leu Lys Ala Ala Ala Phe Pro 
65 70 75 80 

Val Thr Lys Thr Leu Asp Gly Phe Asp Val Thr Gly Ser Ser He Thr 
85 90 95 

Ala Ala Thr Phe Asp Tyr Leu Ser Ser Leu Glu Trp He Arg Ala Gin 
100 105 110 

Gin Asn Leu Ala Val He Gly Pro Pro Gly Thr Gly Lys Ser His Leu 
115 120 125 

Leu He Gly Cys Gly His Ala Ala Val His Ala Gly Phe Lys Val Arg 
130 135 140 

Tyr Phe Thr Ala Ala Asp Leu He Glu Val Leu Tyr Arg Gly Leu Ala 
145 150 155 160 

Asp Asn Thr Val Gly Lys He He Asp Thr Leu Leu Arg Ala Asp Leu 
165 170 175 

Val He Leu Asp Glu He Gly Phe Ala Pro Leu Asp Asp Thr Gly Thr 
180 185 190 

Gin Leu Leu Phe Arg Leu Val Ala Ala Gly Tyr Glu Arg Arg Ser Leu 
195 200 205 

Ala He Ala Ser His Trp Pro Phe Glu Gin Trp Gly Arg Phe Leu Pro 
210 215 220 

Glu His Thr Thr Ala Ala Ser He Leu Asp Arg Leu Leu His His Ala 
225 230 235 240 

Ser He Val Val Thr Ser Gly Glu Ser Tyr Arg Met Arg His Ala Asp 
245 250 255 



His Lys Lys Gly Ala Ala Lys Asn 
260 



(2) INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 789 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GT6ACGTCTG CTCCGACCGT CTCGGTGATA ACGATCTCGT TCAACGACCT CGACGGGTTG 60 

CAGCGCACGG TGAAAAGTGT GCGGGCGCAA CGCTACCGGG 6ACGCATCGA GCACATCGTA 120 

ATCGACGGTG GCAGCGGCGA CGACGTGGTG GCATACCTGT CCGGGTGTGA ACCAGGCTTC 180 

GCGTATTG6C AGTCCGAGCC CGACGGCGGG CGGTACGACG CGATGAACCA GGGCATCGCG 240 

CACGCATCGG GTGATCTGTT GTGGTTCTTG CACTCC6CCG ATCGTTTTTC CGGGCCCGAC 300 

GTGGTAGCCC AGGCCGTGGA GGCGCTATCC GGCAAGGGAC CGGTGTCCGA ATTGTGG6GC 360 

TTCGGGATGG ATCGTCTCGT CGGGCTCGAT CGGGTGCGCG GCCCGATACC TTTCAGCCTG 420 

CGCAAATTCC TGGCCGGCAA GCAGGTTGTT CCGCATCAAG CATCGTTCTT CGGATCATCG 480 

CTGGTGGCCA AGATCGGTGG CTACGACCTT GATTTCGGGA TCGCCGCCGA CCAGGAATTC 540 

ATATTGCGGG CCGCGCT6GT ATGCGAGCCG GTCACGATTC GGTGTGTGCT GTGCGAGTTC 600 

GACACCACGG GCGTCGGCTC GCACCG6GAA CCAAGCGCGG TCTTCGGTGA TCTGCGCCGC 660 

ATGGGCGACC TTCATCGCCG CTACCCGTTC GGGGGAAGGC GAATATCACA TGCCTACCTA 720 

CGCGGCC6GG AGTTCTACGC CTACAACAGT CGATTCTGGG AAAACGTCTT CACGCGAATG 780 

TCGAAATAG 789 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 252 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



Met Thr Ser Ala Pro Thr Val Ser Val He Thr He Ser Phe Asn Asp 
15 10 15 

Leu Asp Gly Leu Gin Arg Thr Val Lys Ser Val Arg Ala Gin Arg Tyr 
20 25 30 

Arg Gly Arg He Glu His He Val He Asp Gly Gly Ser Gly Asp Asp 
35 40 45 

Val Val Ala Tyr Leu Ser Gly Cys Glu Pro Gly Phe Ala Tyr Trp Gin 
50 55 60 

Ser Glu Pro Asp Gly Gly Arg Tyr Asp Ala Met Asn Gin Gly He Ala 
65 70 75 80 

His Ala Ser Gly Asp Leu Leu Trp Phe Leu His Ser Ala Asp Arg Phe 
85 90 95 

Ser Gly Pro Asp Val Val Ala Gin Ala Val Glu Ala Leu Ser Gly Lys 
100 105 110 

Gly Pro Val Ser Glu Leu Trp Gly Phe Gly Met Asp Arg Leu Val Gly 
115 120 125 

Leu Asp Arg Val Arg Gly Pro He Pro Phe Ser Leu Arg Lys Phe Leu 
130 135 140 

Ala Gly Lys Gin Val Val Pro His Gin Ala Ser Phe Phe Gly Ser Ser 
145 150 155 160 

Leu Val Ala Lys He Gly Gly Tyr Asp Leu Asp Phe Gly He Ala Ala 
165 170 175 

Asp Gin Glu Phe He Leu Arg Ala Ala Leu Val Cys Glu Pro Val Thr 
180 185 190 

He Arg Cys Val Leu Cys Glu Phe Asp Thr Thr Gly Val Gly Ser His 
195 200 205 

Arg Glu Pro Ser Ala Val Phe Gly Asp Leu Arg Arg Met Gly Asp Leu 
210 215 220 

His Arg Arg Tyr Pro Phe Gly Gly Arg Arg He Ser His Ala Tyr Leu 
225 230 235 240 



Arg Gly Arg Glu Phe Tyr Ala Tyr Asn Ser Arg Phe Trp Glu Asn Val 
245 250 255 



GCGGCGCTGG AGTGCGAAGG CAAGCCGTGG ATCGACAAGC CGATGATCGC CGGCCGGACA 1020 
TGA 1Q23 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 340 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xt) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Met Lys Arg Ala Leu He Thr Gly He Thr Gly Gin Asp Gly Ser Tyr 
15 10 15 

Leu Ala Glu Leu Leu Leu Ala Lys Gly Tyr Glu Val His Gly Leu He 
20 25 30 

Arg Arg Ala Ser Thr Phe Asn Thr Ser Arg He Asp His Leu Tyr Val 
35 40 45 

Asp Pro His Gin Pro Gly Ala Arg Leu Phe Leu His Tyr Gly Asp Leu 
50 55 60 

He Asp Gly Thr Arg Leu Val Thr Leu Leu Ser Thr He Glu Pro Asp 
65 70 75 80 

Glu Val Tyr Asn Leu Ala Ala Gin Ser His Val Arg Val Ser Phe Asp 
85 90 95 

Glu Pro Val His Thr Gly Asp Thr Thr Gly Met Gly Ser Met Arg Leu 
100 105 110 

Leu Glu Ala Val Arg Leu Ser Arg Val His Cys Arg Phe Tyr Gin Ala 
115 120 125 

Ser Ser Ser Glu Met Phe Gly Ala Ser Pro Pro Pro Gin Asn Glu Leu 
130 135 140 

Thr Pro Phe Tyr Pro Arg Ser Pro Tyr Gly Ala Ala Lys Val Tyr Ser 
145 150 155 160 

Tyr Trp Ala Thr Arg Asn Tyr Arg Glu Ala Tyr Gly Leu Phe Ala Val 
165 170 175 



Asn Gly He Leu Phe Asn His Glu Ser Pro Arg Arg Gly Glu Thr Phe 
180 185 190 



Val Thr Arg Lys He Thr Arg Ala Val Ala Arg He Lys Ala Gly He 
195 200 205 

Gin Ser Glu Val Tyr Met Gly Asn Leu Asp Ala Val Arg Asp Trp Gly 
210 215 220 

Tyr Ala Pro Glu Tyr Val Glu Gly Met Trp Arg Met Leu Gin Thr Asp 
225 230 235 240 

Glu Pro Asp Asp Phe Val Leu Ala Thr Gly Arg Gly Phe Thr Val Arg 
245 250 255 

Glu Phe Ala Arg Ala Ala Phe Glu His Ala Gly Leu Asp Trp Gin Gin 
260 265 270 

Tyr Val Lys Phe Asp Gin Arg Tyr Leu Arg Pro Thr Glu Val Asp Ser 
275 280 285 

Leu He Gly Asp Ala Thr Lys Ala Ala Glu Leu Leu Gly Trp Arg Ala 
290 295 300 

Ser Val His Thr Asp Glu Leu Ala Arg He Met Val Asp Ala Asp Met 
305 310 315 320 



Ala Ala Leu Glu Cys Glu Gly Lys Pro Trp He Asp Lys Pro Met He 
325 330 335 



Ala Gly Arg Thr 
340 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 732 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION :1. .729 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



ATG AGG CTG GCC CGT CGC GCT CGG AAC ATC TTG CGT CGC AAC 66C ATC 
Met Arg Leu Ala Arg Arg Ala Arg Asn He Leu Arg Arg Asn Gly He 
345 350 355 

GAG GTG TCG CGC TAC TTT GCC GAA CTG GAC TGG GAA CGC AAT TTC TTG 
Glu Val Ser Arg Tyr Phe Ala Glu Leu Asp Trp Glu Arg Asn Phe Leu 
360 365 370 

CGC CAA CTG CAA TCG CAT CGG GTC AGT GCC GTG CTC GAT GTC GGG GCC 
Arg Gin Leu Gin Ser His Arg Val Ser Ala Val Leu Asp Val Gly Ala 
375 380 385 

AAT TCG GGG CAG TAC GCC AGG GGT CTG CGC GGC GCG GGC TTC GCG GGC 
Asn Ser Gly Gin Tyr Ala Arg Gly Leu Arg Gly Ala Gly Phe Ala Gly 
390 395 400 

CGC ATC GTC TCG TTC GAG CCG CTG CCC GGG CCC TTT GCC GTC TTG CAG 
Arg He Val Ser Phe Glu Pro Leu Pro Gly Pro Phe Ala Val Leu Gin 
405 410 415 420 

CGC AGC GCC TCC ACG GAC CCG TTG TGG GAA TGC CGG CGC TGT GCG CTG 
Arg Ser Ala Ser Thr Asp Pro Leu Trp Glu Cys Arg Arg Cys Ala Leu 
425 430 435 

GGC GAT GTC GAT GGA ACC ATC TCG ATC AAC GTC GCC GGC AAC GAG GGC 
Gly Asp Val Asp Gly Thr He Ser He Asn Val Ala Gly Asn Glu Gly 
440 445 450 

GCC AGC AGT TCC GTC TTG CCG ATG TTG AAA CGA CAT CAG GAC GCC TTT 
Ala Ser Ser Ser Val Leu Pro Met Leu Lys Arg His Gin Asp Ala Phe 
455 460 465 

CCA CCA GCC AAC TAC GTG GGC GCC CAA CGG GTG CCG ATA CAT CGA CTC 
Pro Pro Ala Asn Tyr Val Gly Ala Gin Arg Val Pro He His Arg Leu 
470 475 480 

GAT TCC GTG GCT GCA GAC GTT CTG CGG CCC AAC GAT ATT GCG TTC TTG 
Asp Ser Val Ala Ala Asp Val Leu Arg Pro Asn Asp He Ala Phe Leu 
485 490 495 500 

AAG ATC GAC GTT CAA GGA TTC GAG AAG CAG GTG ATC GCG GGT GGC GAT 
Lys He Asp Val Gin Gly Phe Glu Lys Gin Val He Ala Gly Gly Asp 
505 510 515 

TCA ACG GTG CAC GAC CGA TGC GTC GGC ATG CAG CTC GAG CTG TCT TTC 
Ser Thr Val His Asp Arg Cys Val Gly Met Gin Leu Glu Leu Ser Phe 



48 



96 



144 



192 



240 



288 



336 



384 



432 



480 



528 



576 



520 



525 



530 



CAG CCG TTG TAG GAG GGT GGC ATG CTC ATC CGC GAG GCG CTC GAT CTC 
Gin Pro Leu Tyr Glu Gly Gly Met Leu He Arg Glu Ala Leu Asp Leu 
535 540 545 

GTG GAT TCG TTG GGC TTT ACG CTC TCG GGA TTG CAA CCC GGT TTC ACC 
Val Asp Ser Leu Gly Phe Thr Leu Ser Gly Leu Gin Pro Gly Phe Thr 
550 555 560 

GAC CCC CGC AAC GGT CGA ATG CTG CAG GCC GAT GGC ATC TTC TTC CGG 
Asp Pro Arg Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg 
565 570 575 580 

GGC A6C GAT TGA 
Gly Ser Asp 



624 



672 



720 



732 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 243 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Arg Leu Ala Arg Arg Ala Arg Asn He Leu Arg Arg Asn Gly He 
15 10 15 

Glu Val Ser Arg Tyr Phe Ala Glu Leu Asp Trp Glu Arg Asn Phe Leu 
20 25 30 

Arg Gin Leu Gin Ser His Arg Val Ser Ala Val Leu Asp Val Gly Ala 
35 40 45 

Asn Ser Gly Gin Tyr Ala Arg Gly Leu Arg Gly Ala Gly Phe Ala Gly 
50 55 60 

Arg He Val Ser Phe Glu Pro Leu Pro Gly Pro Phe Ala Val Leu Gin 
65 70 75 80 

Arg Ser Ala Ser Thr Asp Pro Leu Trp Glu Cys Arg Arg Cys Ala Leu 
85 90 95 

Gly Asp Val Asp Gly Thr He Ser He Asn Val Ala Gly Asn Glu Gly 



100 



105 



110 



Ala Ser Ser Ser Val Leu Pro Met Leu Lys Arg His Gin Asp Ala Phe 
115 120 125 

Pro Pro Ala Asn Tyr Val Gly Ala Gin Arg Val Pro He His Arg Leu 
130 135 140 

Asp Ser Val Ala Ala Asp Val Leu Arg Pro Asn Asp He Ala Phe Leu 
145 150 155 160 

Lys He Asp Val Gin Gly Phe Glu Lys Gin Val He Ala Gly Gly Asp 
165 170 175 

Ser Thr Val His Asp Arg Cys Val Gly Met Gin Leu Glu Leu Ser Phe 
180 185 190 

Gin Pro Leu Tyr Glu Gly Gly Met Leu He Arg Glu Ala Leu Asp Leu 
195 200 205 

Val Asp Ser Leu Gly Phe Thr Leu Ser Gly Leu Gin Pro Gly Phe Thr 
210 215 220 

Asp Pro Arg Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg 
225 230 235 240 



Gly Ser Asp 



(2) INFORMATION FOR SEQ ID NO: 36: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 732 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .729 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GTG AAA TCG TTG AAA CTC GCT CGT TTC ATC GCG CGT AGC GCC GCC TTC 



Val Lys Ser Leu Lys Leu Ala Arg Phe He Ala Arg Ser Ala Ala Phe 
245 250 255 



GAG GTT TCG CGC CGC TAT TCT GAG CGA GAC CTG AAG CAC CAG TTT GTG 96 
Glu Val Ser Arg Arg Tyr Ser Glu Arg Asp Leu Lys His Gin Phe Val 
260 265 270 275 

AAG CAA CTC AAA TCG CGT CGG GTA GAT GTC GTT TTC GAT GTC GGC GCC 144 
Lys Gin Leu Lys Ser Arg Arg Val Asp Val Val Phe Asp Val Gly Ala 
280 285 290 

AAC TCA GGA CAA TAC GCC GCC GGC CTC CGC CGA GCA GCA TAT AAG GGC 192 
Asn Ser Gly Gin Tyr Ala Ala Gly Leu Arg Arg Ala Ala Tyr Lys Gly 
295 300 305 

CGC ATT GTC TCG TTC GAA CCG CTA TCC GGA CCG TTT ACG ATC TTG GAA 240 
Arg He Val Ser Phe Glu Pro Leu Ser Gly Pro Phe Thr He Leu Glu 
310 315 320 

AGC AAA GCG TCA ACG GAT CCA CTT TGG GAT TGC CGG CAG CAT GCG TTG 288 
Ser Lys Ala Ser Thr Asp Pro Leu Trp Asp Cys Arg Gin His Ala Leu 
325 330 335 

GGC GAT TCT GAT GGA ACG GTT ACG ATC AAT ATC GCA GGA AAC GCC GGT 336 
Gly Asp Ser Asp Gly Thr Val Thr He Asn He Ala Gly Asn Ala Gly 
340 345 350 355 

CAG AGC AGT TCC GTC TTG CCC ATG CTG AAA AGT CAT CAG AAC GCT TTT 384 
Gin Ser Ser Ser Val Leu Pro Met Leu Lys Ser His Gin Asn Ala Phe 
360 365 370 

CCC CCG GCA AAC TAT GTC GGT ACC CAA GAG GCG TCC ATA CAT CGA CTT 432 
Pro Pro Ala Asn Tyr Val Gly Thr Gin Glu Ala Ser He His Arg Leu 
375 380 385 

GAT TCC GTG GCG CCA GAA TTT CTA GGC ATG AAC GGT GTC GCT TTT CTC 480 
Asp Ser Val Ala Pro Glu Phe Leu Gly Met Asn Gly Val Ala Phe Leu 
390 395 400 

AAG GTC GAC GTT CAA GGC TTT GAA AAG CAG GTG CTC GCC GGG GGC AAA 528 
Lys Val Asp Val Gin Gly Phe Glu Lys Gin Val Leu Ala Gly Gly Lys 
405 410 415 

TCA ACC ATA GAT GAC CAT TGC GTC GGC ATG CAA CTC GAA CTG TCC TTC 576 
Ser Thr He Asp Asp His Cys Val Gly Met Gin Leu Glu Leu Ser Phe 
420 425 430 435 

CTG CCG TTG TAC GAA GGT GGC ATG CTC ATT CCT GAA GCC CTC GAT CTC 624 



Leu Pro Leu Tyr Glu Gly Gly Met Leu He Pro Gl u Ala Leu Asp Leu 
440 445 450 



GTG TAT TCC TTG GGC TTC ACG TTG ACG GGA TTG CTG CCT TGT TTC ATT 672 
Val Tyr Ser Leu Gly Phe Thr Leu Thr Gly Leu Leu Pro Cys Phe He 
455 460 465 

GAT GCA AAT AAT GGT CGA ATG TTG CAG GCC GAC GGC ATC TTT TTC CGC 720 
Asp Ala Asn Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg 
470 475 480 

GAG GAC GAT TGA 732 
Glu Asp Asp 
485 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 243 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Val Lys Ser Leu Lys Leu Ala Arg Phe He Ala Arg Ser Ala Ala Phe 
15 10 15 

Glu Val Ser Arg Arg Tyr Ser Glu Arg Asp Leu Lys His Gin Phe Val 
20 25 30 

Lys Gin Leu Lys Ser Arg Arg Val Asp Val Val Phe Asp Val Gly Ala 
35 40 45 

Asn Ser Gly Gin Tyr Ala Ala Gly Leu Arg Arg Ala Ala Tyr Lys Gly 
50 55 60 

Arg He Val Ser Phe Glu Pro Leu Ser Gly Pro Phe Thr He Leu Glu 
65 70 75 80 

Ser Lys Ala Ser Thr Asp Pro Leu Trp Asp Cys Arg Gin His Ala Leu 
85 90 95 

Gly Asp Ser Asp Gly Thr Val Thr He Asn He Ala Gly Asn Ala Gly 
100 105 110 

Gin Ser Ser Ser Val Leu Pro Met Leu Lys Ser His Gin Asn Ala Phe 



115 



120 



125 



Pro Pro Ala Asn Tyr Val Gly Thr Gin Glu Ala Ser He His Arg Leu 
130 135 140 

Asp Ser Val Ala Pro Glu Phe Leu Gly Met Asn Gly Val Ala Phe Leu 
145 150 155 160 

Lys Val Asp Val Gin Gly Phe Glu Lys Gin Val Leu Ala Gly Gly Lys 
165 170 175 

Ser Thr He Asp Asp His Cys Val Gly Met Gin Leu Glu Leu Ser Phe 
180 185 190 

Leu Pro Leu Tyr Glu Gly Gly Met Leu He Pro Glu Ala Leu Asp Leu 
195 200 205 

Val Tyr Ser Leu Gly Phe Thr Leu Thr Gly Leu Leu Pro Cys Phe He 
210 215 220 

Asp Ala Asn Asn Gly Arg Met Leu Gin Ala Asp Gly He Phe Phe Arg 
225 230 235 240 

Glu Asp Asp 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 828 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:!. .825 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

ATG GTG CAG ACG AAA CGA TAC GCC GGC TTG ACC GCA GCT AAC ACA AAG 
Met Val Gin Thr Lys Arg Tyr Ala Gly Leu Thr Ala Ala Asn Thr Lys 
245 250 255 



AAA GTC GCC ATG GCC GCA CCA ATG TTT TCG ATC ATC ATC CCC ACC TTG 96 
Lys Val Ala Met Ala Ala Pro Met Phe Ser He He He Pro Thr Leu 
260 265 270 275 

AAC GTG GCT GCG GTA TTG CCT GCC TGC CTC GAC AGC ATC GCC CGT CAG 144 
Asn Val Ala Ala Val Leu Pro Ala Cys Leu Asp Ser He Ala Arg Gin 
280 285 290 

ACC TGC GGT GAC TTC GAG CTG GTA CTG GTC GAC GGC GGC TCG ACG GAC 192 
Thr Cys Gly Asp Phe Glu Leu Val Leu Val Asp Gly Gly Ser Thr Asp 
295 300 305 

GAA ACC CTC GAC ATC GCC AAC ATT TTC GCC CCC AAC CTC GGC GAG CGG 240 
Glu Thr Leu Asp He Ala Asn He Phe Ala Pro Asn Leu Gly Glu Arg 
310 315 320 

TTG ATC ATT CAT CGC GAC ACC GAC CAG GGC GTC TAC GAC GCC ATG AAC 288 
Leu He He His Arg Asp Thr Asp Gin Gly Val Tyr Asp Ala Met Asn 
325 330 335 

CGC GGC GTG GAC CTG GCC ACC GGA ACG TGG TTG CTC TTT CTG GGC GCG 336 
Arg Gly Val Asp Leu Ala Thr Gly Thr Trp Leu Leu Phe Leu Gly Ala 
340 345 350 355 

GAC GAC AGC CTG TAC GAG GCT GAC ACC CTG GCG CGG GTG GCC GCC TTC 384 
Asp Asp Ser Leu Tyr Glu Ala Asp Thr Leu Ala Arg Val Ala Ala Phe 
350 365 370 

ATT GGC GAA CAC GAG CCC AGC GAT CTG GTA TAT GGC GAC GTG ATC ATG 432 
He Gly Glu His Glu Pro Ser Asp Leu Val Tyr Gly Asp Val He Met 
375 380 385 

CGC TCA ACC AAT TTC CGC TGG GGT GGC GCC TTC GAC CTC GAC CGT CTG 480 
Arg Ser Thr Asn Phe Arg Trp Gly Gly Ala Phe Asp Leu Asp Arg Leu 
390 395 400 

TTG TTC AAG CGC AAC ATC TGC CAT CAG GCG ATC TTC TAC CGC CGC GGA 528 
Leu Phe Lys Arg Asn He Cys His Gin Ala He Phe Tyr Arg Arg Gly 
405 410 415 

CTC TTC GGC ACC ATC GGT CCC TAC AAC CTC CGC TAC CGG GTC CTG GCC 576 
Leu Phe Gly Thr He Gly Pro Tyr Asn Leu Arg Tyr Arg Val Leu Ala 
420 425 430 435 

GAC TGG GAC TTC AAT ATT CGC TGC TTT TCC AAC CCA GCG CTC GTC ACC 624 
Asp Trp Asp Phe Asn He Arg Cys Phe Ser Asn Pro Ala Leu Val Thr 
440 445 450 



CGC TAG ATG CAC GTG GTC GTT 
Arg Tyr Met His Val Val Val 
455 

AGC AAT ACG ATC GTC GAC AAG 
Ser Asn Thr lie Val Asp Lys 
470 

ACG AGA CTC GGC ATA AGG CTG 
Thr Arg Leu Gly He Arg Leu 
485 490 

AAG GTG ATC AGC AGG GCC ATG 
Lys Val lie Ser Arg Ala Met 
500 505 

CGC CGA CGT TAG 
Arg Arg Arg 



GCA AGC TAC AAC GAA TTC GGC GGG CTC 
Ala Ser Tyr Asn Glu Phe Gly Gly Leu 
460 465 

GAG TTT TTG AAG CGG CTG CCG ATG TCC 
Glu Phe Leu Lys Arg Leu Pro Met Ser 
475 480 

GTC ATA GTT CTG GTG CGC AGG TGG CCA 
Val He Val Leu Val Arg Arg Trp Pro 
495 

GTA ATG CGC ACC GTC ATT TCT TGG CGG 
Val Met Arg Thr Val He Ser Trp Arg 
510 515 



672 



720 



768 



816 



828 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 275 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Met Val Gin Thr Lys Arg Tyr Ala Gly Leu Thr Ala Ala Asn Thr Lys 
15 10 15 

Lys Val Ala Met Ala Ala Pro Met Phe Ser He He He Pro Thr Leu 
20 25 30 

Asn Val Ala Ala Val Leu Pro Ala Cys Leu Asp Ser He Ala Arg Gin 
35 40 45 

Thr Cys Gly Asp Phe Glu Leu Val Leu Val Asp Gly Gly Ser Thr Asp 
50 55 60 

Glu Thr Leu Asp He Ala Asn He Phe Ala Pro Asn Leu Gly Glu Arg 
65 70 75 80 

Leu He He His Arg Asp Thr Asp Gin Gly Val Tyr Asp Ala Met Asn 
85 90 95 



Arg Gly Val Asp Leu Ala Thr Gly Thr Trp Leu Leu Phe Leu Gly Ala 
100 105 110 



Asp Asp Ser Leu Tyr Glu Ala Asp Thr Leu Ala Arg Val Ala Ala Phe 
115 120 1Z5 

He Gly Glu His Glu Pro Ser Asp Leu Val Tyr Gly Asp Val He Met 
130 135 140 

Arg Ser Thr Asn Phe Arg Trp Gly Gly Ala Phe Asp Leu Asp Arg Leu 
145 150 155 160 

Leu Phe Lys Arg Asn He Cys His Gin Ala He Phe Tyr Arg Arg Gly 
165 170 175 

Leu Phe Gly Thr He Gly Pro Tyr Asn Leu Arg Tyr Arg Val Leu Ala 
180 185 190 

Asp Trp Asp Phe Asn He Arg Cys Phe Ser Asn Pro Ala Leu Val Thr 
195 200 205 

Arg Tyr Met His Val Val Val Ala Ser Tyr Asn Glu Phe Gly Gly Leu 
210 215 220 

Ser Asn Thr He Val Asp Lys Glu Phe Leu Lys Arg Leu Pro Met Ser 
225 230 235 240 

Thr Arg Leu Gly He Arg Leu Val He Val Leu Val Arg Arg Trp Pro 
245 250 255 

Lys Val He Ser Arg Ala Met Val Met Arg Thr Val He Ser Trp Arg 
260 265 270 

Arg Arg Arg 
275 

(2) INFORMATION FOR SEQ ID NO: 40^ 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 



GATGCCGTGA GGAGGTAAAG CTGC 24 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



GATACGGCTC TTGAATCCTG CACG 



24 
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